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ABSTRACT 


The  purpose  of  this  MTTRE  Sponsored  Research  project  was  to  develop  methods  and 
measures  for  evaluating  user-system  interface  effectiveness  for  command  ^  control  systems 
with  graphical,  direct  manipulation  style  interfaces.  Due  to  the  increased  use  of  user  interface 
prototyping  during  concept  definititMi  and  demmstrationA^rJidation  phases,  the  oppcntunity 
exists  for  human  factors  engineers  to  apply  evaluation  methodologies  early  enou^  in  the  life 
cycle  to  make  an  impact  on  system  design.  Understanding  and  improving  user-system 
interface  (USI)  evaluation  techniques  is  critical  to  this  process.  In  1986,  Norman  proposed  a 
desoiptive  "stages  of  user  activity"  model  of  human-computer  interaction  (HCI).  Hutchins, 
HoUan,  and  Norman  (1986)  proposed  concepts  of  measures  based  on  the  model  which  would 
assess  the  directness  of  the  engagements  between  the  user  and  the  interface  at  each  stage  of  the 
model.  We  created  q)erational  definitions  of  the  concepts  of  directness,  and  derived 
observable  indicators  that  certain  types  of  indirectness  may  exist  in  the  interface  design.  This 
phase  of  our  research  program  involved  using  these  concepts  as  a  basis  for  a  methodology  of 
analyzing  data  collect^  during  usability  studies.  A  usability  study  was  performed  on  the 
Military  Airspace  Management  System  (MAMS)  prototype;  four  participants'  and  one  user 
interface  expert's  data  were  used  for  further  analysis. 

We  first  proved  that  in  order  to  assess  concepts  such  as  the  direcmess  of  user-system 
interface  engagements  we  need  to  know  both  what  the  user  intended  to  do  and  what  they  did. 
This  involves  integrating  data  collected  via  different  media  (computer  collected  keystrokes, 
transcribed  user  protocols,  video  of  die  display  ouqiut).  A  model-based,  two-level  encoding 
scheme  was  then  created  and  applied  to  the  us^ility  data  to  aid  in  extracting  and  quantifying 
measures  of  USI  effectiveness.  The  first  level  provides  a  high-level  description  of  user 
activity,  depicting  users'  task  intenticms,  intentions  to  execute,  errors  by  stages,  and  the 
success  of  their  endeavor.  The  second  level  inovides  detailed  information  on  the  users'  input 
activities  at  a  user-interface  object  level.  The  two  levels  combined  provide  a  congilete 
description  of  what  the  users  want  to  do,  how  they  did  it,  and  how  directly  the  system  allows 
them  to  do  it  We  then  manually  extracted  our  derived  indicates  of  indirecmess  from  each 
user's  data  and  were  able  to  perform  a  much  mcxe  conqilete  and  quantifiable  analysis  of  the 
user-system  interface  than  would  have  been  possible  with  moe  traditional  evaluation  methods. 
Exanples  of  usability  problems  identified  with  this  method  are  provided  and  we  discuss  the 
need  for  a  computer  tool  to  make  application  of  die  method  more  efficient 


EXECUTIVE  SUMMARY 


I 

INTRODUCTION 

The  focus  of  the  project  Measures  of  User-System  Interface  Effectiveness  is  to  study  and 
validate  methodologies  and  measures  for  analyzing  the  overall  effectiveness  of  user-system 
interfaces  (USI)  for  task  performance.  There  is  an  increased  emphasis  on  user-centered  system 
design  which  involves  designing  a  system  from  a  user’s  perspective,  where  the  concepts, 
objects,  and  actions  embodied  in  a  system  closely  match  the  user’s  task  concepts,  objects,  and 
actions  allowing  users  to  interact  wiA  the  computer  task  domain  in  a  direct  way.  This  report, 
the  third  in  a  series  of  MSR  reports,  documents  the  evaluation  methodology  we  developed  for 
analyzing  data  collected  in  usability  studies,  and  provides  examples  of  the  method  t^plied  to  a 
prototype  system. 

MEASURING  GRAPHICAL,  DIRECT  MANIPULATION  STYLE 
INTERFACES 

The  class  of  interfaces  we  were  interested  in  evaluating  were  graphical,  direct-manipulation 
style  interfaces  supporting  ill-defined  tasks.  Hi-defined  tasks  refer  to  tasks  which  users 
perfcHm  which  have  more  than  one  conect  solution,  and  alternative  methods  fOT  performing 
these  tajtks  exist  This  class  of  applications  would  include  scheduling  tasks,  mission  planning 
tasks,  and  con^uter-aided  architectural  design  tasks.  These  tasks  can  be  contrasted  to  well- 
defin^  tasks  such  as  some  data  entry  tasks  where  there  is  one  correct  solution,  e.g.,  a 
document  is  entered  into  the  system  and  edited  until  error  fiee.  The  attributes  of  the  interface, 
direct  roarupulation  and  gitqrhical,  as  well  as  the  ill-ddined  nature  of  the  tasks  makes  traditional 
USI  evaluation  measures  less  useful  in  terms  of  the  feedback  they  provide.  Traditional  USI 
evaluation  measures  tend  to  be  summary  measures  such  as  time  to  conqrlete  a  task,  percent  of 
task  completed,  time  spent  in  errors,  percent  or  number  of  errors,  command  frequency,  etc. 

I  (Whiteside  et  al.,  1988).  These  are  gross  measures  and  while  various  aspects  of  the  interface 

will  undoubtedly  affect  these  measures,  this  type  of  measure  alone  does  not  provide  us  with 
enough  granularity  and  diagnostic  information  on  each  user  interaction  with  the  system. 
Additionally,  the  concepts  of  direct  manipulaticm  raise  a  virtuaUy  unexplcaed  area  in  terms  of 
defiiung  and  measuring  directness  to  a  degree  that  they  can  be  applied  in  practice.  In  summary, 
a  method  for  assessing  user  interfaces  for  this  class  of  interfaces  needs  to  be  defined. 


V 


CONCEPTS  OF  SEMANTIC  AND  ARTICULATORY  DISTANCE 


Norman  (1986),  and  Hutchins,  HoUan  and  Norman  (1986)  provide  a  good  treatment  of 
concepts  of  directness  in  user-system  engagements.  In  their  conceptual  model  of  human- 
computer  interaction  they  describe  seven  stages  a  user  could  traverse  while  accomplishing  a 
goal  with  a  ccHnputCT;  intoition  formation,  action  specification,  execution,  percq)tion, 
interpretation,  and  evaluation.  They  then  define  four  concepts  of  distance  which  are  critical  to 
making  a  design  user-centered:  semantic  and  articulatory  distance  of  execution,  and  semantic 
and  articulatory  distance  of  evaluation.  Semantic  distance  of  execution  spans  the  intention 
formation  stage  and  involves  w  hether  the  uso*  can  say  what  he/she  wants  to  say  directly  with 
the  computer  system  ot  whether  a  complex  expression  is  required.  Articulatory  distance  of 
execution  spans  the  action  specification  stage  and  reflects  the  closeness  of  the  form  of  the 
action  to  be  executed  to  the  meaning  of  the  input  expression.  This  is  followed  by  the  stages  of 
execution  and  perception  --  the  stages  spanning  the  translation  from  mental  state  to  physical 
activity  and  back  again.  Articulatory  distance  of  evaluation  spans  the  interpretation  stage  and 
concerns  how  easily  the  output  expression  can  be  extracted  f^m  the  output  expression  form. 
Semantic  distance  of  evaluation  concerns  the  ease  with  which  users  can  determine  whether  they 
acctxnplished  their  goal. 

These  ctmcepts  are  complex  and  intriguing  but  still  rather  high-level.  Characterizing  a 
system  by  how  well  it  supported  the  different  stages,  however,  would  provide  us  with  the  right 
level  of  information  neet^  to  successfully  iterate  a  design.  We  derived  indicators  or  behavitn^ 
of  indirectness  for  each  stage,  based  on  Hutchins  et  al.  ctmcepts  of  directness;  one  set  of 
indicators  is  shown  below.  Supporting  identificatitxi  of  the  indicators  involves  collecting  and 
evaluating  user-system  performance  at  an  interactim-by-interaction  level  and  the  sequencing  of 
engagements  would  be  important  We  derived  a  model-based  methodology  which  ^lows  us  to 
do  this. 

Causes  of  semantic  indirectness  of  execution  and  evaluation  and  the 
corresponding  observable  indicators 


Semantic  indirectness  of 

_ execution  if: _ Indicator _ 

User  intention  not  supported  •  Protocol  stating  desired  functitm 

•  Attempting  to  execute  unsupported  function,  forced  to  abort 

Missing  high-level  object  •  Same  step  or  set  of  actions  repeated  on  lower-level  objects 


Complex  expression  required  •  Many  steps/actions  required  to  complete  intention 
to  accomplish  intention  •  Errors  in  step  order 

•  Incomplete/abcxts  in  intentions 


vi 


Semantic  indirectness  of 
_ evaluation  if: _ 


Indicator 


Extra  step/s  required  to  •  Number  and  purpose  of  steps  performed  (e.g.,  to  get 

perform  an  evaluation  information,  or  “check"  something) 


Difficult  or  user  unable  to  •  Frequency  and  types  of  evaluation  errors 
perform  an  evaluation _ « Evaluation  not  na^e _ 


THE  METHODOLOGY 

The  methodology  consisted  of  four  major  steps.  The  Erst  step  was  to  conduct  a  usability 
study;  this  involves  real  users  exercising  a  system  or  prototype  while  evaluators  collect  data  on 
the  process.  We  have  determined  that  both  verbal  protocol  data  (where  users  are  asked  to  voice 
their  thoughts  aloud),  as  well  as  time-stamped  computer  collected  history  logs  (records  all  the 
users  input  actions)  are  required  to  be  able  to  assess  the  four  directness  of  engagement 
concepts.  Protocols  provide  information  about  what  a  user  intends  to  do  while  the  history  log 
provides  information  about  how  the  user  did  it  The  latter  is  easier  to  collect  and  analyze  but  is 
ambiguous  and  insufficient  if  used  alone. 

A  usability  study  was  conducted  using  a  prototyped  airspace  management  scheduling 
system.  Data  was  collected  on  seven  participants,  with  the  method  being  applied  to  five  of  the 
participant's  data.  One  of  the  participants  was  the  USI  design  engineer  for  the  project  and 
served  as  our  "user-interface  expert"  participant 

The  second  step  involved  integrating  the  collected  data  by  combining  the  transcribed  user 
protocols  with  the  appropriate  portions  of  the  user's  history  Ele;  this  was  done  manually. 

The  third  step  involved  develqring  and  applying  a  two-level  encoding  scheme,  based  on 
Norman's  model,  to  the  data.  The  first  level  of  Ae  encoding  scheme  provides  a  high-level 
description  of  user  activity,  depicting  users'  task  intentitms,  intentions  to  execute,  errors  by 
stages,  and  the  success  of  each  endeavor.  The  second  level  provides  detailed  information  on 
the  users'  input  activities  at  a  user-interface  object  level.  The  two  levels  combined  provide  a 
complete  description  of  what  the  users  want  to  do,  how  they  did  it,  and  how  well  they  did  it. 
The  codes  and  their  descriptions  are  shown  in  the  tables  below. 


SemantiC'Level  Encodings 


Encoding 

Definition 

Goal 

Scenario  step. 

Task  intention  (Inttask) 

An  intention  to  complete  one  task 
contributing  to  the  completion  of  a  goal. 

Perception  intention 
(Intper) 

An  intention  to  improve  the  perceptibility  of 
a  display. 

Intention  to  execute 
(Intexe) 

One  conputer  step  (may  be  comprised  of 
multiple  actions)  leading  to  the  completion 
of  a  task  intention.  Several  steps  may  be 
required  per  task  intention. 

Evaluate  (Eval) 

The  success  with  which  the  intention  was 
accomplished. 

Enor  in  intention  (Err.int) 

The  intention  was  incorrect  and  will  not 
accomplish  the  goal. 

Enror  in  action  specification 
(Eir.acsp) 

Wrong  s«iuence  of  actions  to  accomplish 
the  intention  to  execute. 

Error  in  execution 
(Err.  exec) 

Manual,  nx)tor  error  in  executing. 

Error  in  perception 
(Err,  per) 

Break-down  in  human  perceptual 
processing  of  information  on  a  display. 

Error  in  interpretation 
(Err,  inter) 

User  fails  to  interpret  system  state  correctly. 

Error  in  evaluation 
(Err.eval) 

User  mistakenly  thinks  has  or  has  not 
nK>ved  closer  to  the  goal. 

Recovered  error  (Rec.err) 

Error  was  detected  and  recovered  from. 

Articulatory<Level  Encodings 


_ Encoding _ 

Menu 

Conunand 

List-Select 

Button 

Reid 

ScroU 

Window 

Application-specific  objects 


_ Definition _ 

A  menu  was  opened 
A  axnmand  was  selected 
An  item  was  selected  from  a  list 
A  button  was  selected 
An  action  was  taken  in  a  field 
A  scroll  bar  action  was  performed 
A  window  acticm  was  p^ormed 
Encodings  to  track  the  manipulation  of 
application-specific  objects 


The  encoding  of  the  data  was  done  with  the  aid  of  a  tool  called  SHAPA,  developed  at  the 
University  of  Illinois  at  Urbana-Champagne. 

The  fourth  step  in  the  evaluation  methodology  involved  extracting  the  indicators  of  interest 
from  the  encoded  data  files  and  con^aring  them  across  users.  For  ease  of  recOTding  the 
extracted  information,  we  created  a  data  summarization  table.  For  each  user  task  intention  the 
critical  information  is  summarized  in  a  manner  which  allows  for  easy  comparison  across 
subjects. 


An  excerpt  from  a  real  participant's  summary  table  for  the  task  "schedule  missions"  is 
shown  in  the  figure  below. 


Int.task 

Freq 

Intexec 

Freq 

#  actions 
per  int.exec 

Eval  of 
int.exec 

Eval  of 
int.task 

Errors 

PH 

2-24  resconf 

1 

lookthawk/sdt-w 

1 

4 

OK 

thawk/sdt-w 

bokwpn-w 

1 

1 

OK 

mov«sdt-w 

1 

6 

OK 

OK 

2-25  schsdt-w 

1 

schsdt-w 

1 

3 

OK 

OK 

2-26  schwon-w 

1 

1 

2 

OK 

OK 

2-27  sch124(X)26-W 

1 

IOOK1240026-W 

1 

2 

sch1240026-w 

1 

1 

OK 

OK 

2-30  schfox-w 

1 

lookfox-w 

1 

2 

OK 

movefox-w 

1 

1 

OK 

e14  -  conflici 

schfox-w 

1 

1 

OK 

OK 

state  •  R 

ix 


Additionally,  we  collected  the  same  data  and  completed  the  same  fcnm  on  the  system  user- 
interface  expert,  to  provide  us  with  a  baseline  of  expert  system  performance.  The  expen's  data 
illustrates  the  best  the  system  can  do.  The  real  users'  data  illustrates  the  ease  with  which  the 
users  could  accomplish  their  tasks  with  this  system  and  the  direcmess  of  engagements. 
Comparison  of  data  across  subjects  allows  for  distinguishing  between  system-induced 
problems  (more  than  one  user  has  same  difficulty),  effects  of  training  (only  least-experienced 
subjects  had  the  problem),  and  individual  user  problems  (only  one  user  had  that  tvpe  of 
problem).  Examples  of  the  type  of  information  we  were  able  to  extract  for  one  go^,  "schedule 
missions",  are  summarized  below. 

Indicator  Potential  Problem 

Repetitive  sequences  for  applying  the  tq)piove  Can  not  select  groups  of  objects  for  application  of  a 

command  to  missions  single  command 

Repetitive  sequences  for  applying  the  a|>prove  System  does  not  consider  mission  as  an  object  for  the 

conunand  to  parts  of  a  single  mission  case  of  tqjplying  scheduling  commands 

An  abort  while  trying  to  bring  up  all  parts  of  the  System  does  not  consider  mission  parts  as  an  object  for 
"dact”  mission  on  the  display  the  case  of  finding  the  whole  mission 

An  extra  intention  to  execute  required  to  "look"  Information  in  dialog  box  is  often  required  before 

when  the  task  intention  is  to  schedule  a  mission  mission  can  be  approved.  To  increase  feeling  of 

directness  the  two  steps  should  be  combined  in  some 
manner. 

Perceptual/execute  errors  When  the  missions  were  too  close  together,  users  would 

selea  the  wrong  one.  There  was  no  way  to  difierentiate 
missions  when  the  labels  were  very  small. 

Execute  error,  many  actions  for  recovery  A  user  selects  deny  from  menu  rather  than  approve 

which  is  adjacent  Lack  of  undo  causes  user  to  pofom 
multiple  actions  to  fix. 

Mission  icons  were  often  accidentally  moved.  Users 
then  had  to  manually  reposition  them.  Two  problems 
are  icons  were  too  sensitive,  and  thoe  is  a  lack  of  an 
undo  feature,  resulting  in  multiple  actions  to  undo  a 
previous  action. 

Finally,  an  error  taxonomy  based  on  the  stages  in  the  HCI  model  was  created  and  ^plied  to 
the  recorded  instances  of  errors.  This  information  is  provided  in  Appendix  D. 


X 


ACCOMPLISHMENTS 


The  results  obtained  to  date  on  measures  of  user-system  interface  effectiveness  are  very 
promising.  We  devel(^)ed  a  method  which  allows  us  to  obtain  information  on  the  direcmess  of 
user  engagements  with  a  system.  The  method  involves  integrating  protocol  data  with  history  file 
data,  for  a  complete  and  useful  picture  of  human-con^uter  interaction  (HCI)  activity.  We  created 
a  theory-based  encoding  scheme  which  provides  a  method  for  quantitative  analysis  of  the  data. 
We  also  created  an  errOT  classification  scheme  based  on  the  stages  of  user  activity  nxxlel,  which 
provides  much  more  mformadon  on  why  an  error  occurred  and  how  to  fix  the  system  to  prevent 
it  dian  is  possible  to  obtain  from  traditional  error  frequency  measures.  Indicates  of  user-system 
interface  (USI)  effectiveness  extending  beyond  errors  and  time  were  derived  and  found  to  be 
useful.  We  have  shown  that  a  USI  engagement  can  be  error-free  but  not  be  direct,  and  new 
measures  and  indicators  such  as  those  described  here  are  required  for  a  ctnnplete  evaluadm.  The 
measures  are  also  in  a  form  which  allows  for  easy  comparison  across  subjects.  This  technique 
also  allows  us  to  determine  whether  difficulties  are  due  to  a  single  user's  inexperience  or  whether 
problems  can  be  attributed  to  the  system  design. 

FUTURE  WORK 

Measures  and  Indicators 

We  need  to  do  several  things  in  the  area  of  refining  the  USI  measures  and  indicators.  First, 
we  need  more  rigorous  definitions  of  the  different  levels  of  the  encodings;  when,  for  example,  is 
something  a  task  intenticm  as  compared  to  an  inrentitxi  to  execute?  While  we  tried  to  be 
consistent  in  our  application  of  these  terms,  it  was  difficult,  particularly  as  this  was  the  first  time 
we  ai^lied  the  scheme.  The  same  problem  holds  with  regard  to  the  level  of  detail  for  the 
intentions  to  execute.  Sometimes  ^  actions  within  a  dialog  box  were  cmsidered  to  be  a  single 
intention  to  execute,  and  sometimes  particular  actions  were  bixdcen  out  separately.  This  may 
need  to  be  flexible  based  on  what  areas  of  die  USI  are  to  be  evaluated. 

We  would  like  to  continue  to  wta-k  on  the  definitions  and  names  for  the  different  USI 
indicators.  There  also  seem  to  be  many  kinds  of  repetitions  which  are  indicators  of  different 
types  and  levels  of  problems.  We  would  like  to  cl^sify  all  of  these  various  kinds  of  repetitions 
and  determine  what  they  inqily  for  the  system  design.  FinaUy,  we  need  to  apply  the  encoding 
scheme  to  a  different  system  to  ensure  it  is  generic  across  systems,  and  continue  to  refine  it 

The  time  required  to  apply  our  evaluation  method  could  be  greatly  shortened  by  creating  a 
software  system  dedicated  to  suppcating  this  method.  The  system  wUl  be  multi-m^a  and  will 
aid  in  integrating  the  history  Hie  and  the  users'  intentions.  We  may  be  able  to  remove  the  stqj 
of  transcribing  all  of  the  users'  protocols  and  just  extract  the  information  needed  for  the 
intentions,  evaluations,  etc.  We  currently  plan  to  build  this  tool  in  FY93. 
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SECTION  1 


INTRODUCTION 

The  focus  of  the  project  Measures  of  User-System  Interface  Effectiveness  is  to  study 
and  validate  methodologies  and  measures  for  andyzing  the  overall  effectiveness  of 
graphical  direct-manipulation  user-system  interfaces  (USI)  for  task  performance.  There  is 
an  increased  ert^hasis  on  user-centoed  system  design.  This  involves  designing  a  system 
from  a  user's  perspective  so  that  the  concepts,  objects,  and  actions  embodied  in  a  system 
closely  match  the  user's  task  concepts,  objects,  and  actions,  thus  allowing  users  to  interact 
with  Ae  conqjuter  task  domain  in  a  more  direct  way.  There  has  been  little  weak  done  (xi 
operationally  defining  concepts  of  directness  ot  defirung  evaluation  techniques  for 
assessing  dheemess.  This  paper  documoits  the  results  of  using  an  existing  human- 
conq)uter  interaction  framework  to  develop  cognitive-based  measures  of  the  directness  of 
engagements,  and  the  results  of  applying  this  new  approach  to  an  actual  system.  Section  1 
provides  background  infomoation  on  the  project,  the  rationale  for  this  woric,  and  introduces 
the  thetxies  and  models  on  which  the  worir  was  based.  Section  2  discusses  the  new 
scheme  fm*  assessing  graphical  user  interfaces  at  a  prescriptive  and  cognitive  level.  Secticxi 
3  provides  the  results  of  qiplying  the  new  evaluation  methodolr^  to  a  prototyped  system. 
Section  4  concludes  with  a  review  of  the  advantages  the  new  method  and  discusses 
additional  areas  of  research  required,  particularly  tiie  need  for  a  tool  to  support  aj^licatim 
of  the  method. 

1.1  FY92  RESEARCH  PROGRAM 

The  plan  for  the  FY92  MSR  project  is  shown  in  figure  1.  The  first  step  was  to  review 
models  ^  hutnan-conq>utBr  interaction  (HQ),  review  user-system  interface  (USI)  evaluation 
techiuques  and  data  analysis  tools,  and  to  derive  concepts  for  potential  USI  measures  based  on 
the  models  of  human  cognitiem  and  HCL  Volume  1  of  this  MSR  report  series  (MTR 
92B0(XX)047)  documented  the  results  of  this  first  phase  of  the  research.  Rom  the  review  of 
USI  models,  measures,  and  evaluation  techniques,  we  decided  to  focus  further  study  on 
structured  judgement  techniques  and  user-based  evaluations  as  these  tqipeared  most  suitable  for 
application  to  ctmitnand  and  control  (C2)  systems.  The  issue  to  be  resolved  with  the  structured 
ji^gement  techruques  was  whether  they  assessed  USI  problem  areas  on  the  newer  graphical, 
direct-manipulaticMi  style  interfaces.  For  user-based  evaluations,  tiie  cemeem  was  whether 
traditional  measures  ctqttured  USI  effectiveness  of  the  grafducal,  direct-manipulatioi-sQrle 
interfaces.  The  second  and  diird  phases  of  the  research  project  addressed  tiiese  issues. 

In  phase  2,  structured  judgement  evaluation  techruques  were  reviewed  and  applied  to 
predict  where  users  might  encounter  interaction  difficulties  during  task  performance. 

Structured  judgement  techniques  involve  an  expert  in  USI  design  assessing  a  user  interface; 
users  are  not  invdved.  These  evaluation  techniques  were  iqrplied  to  tiie  Military  Airspace 
Managemrat  System  (MAMS),  a  prototype  of  a  militaty  airspace  scheduling  system.  This 
system  served  as  our  application  system  for  the  entire  study  and  was  selected  l^ause  it  has  a 
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graphical,  direct-manipulation  style  interface.  The  three  structured  judgement  techniques  used 
were  cognitive  walkthrough,  heuristic  evaluation,  and  guidelines.  It  was  found  that  the 
cognitive  walkthrough  method  applied  almost  exclusively  to  analyzing  the  user's  computer- 
input  actitMis.  The  guidelines  were  mcne  generally  ^plicable  across  the  various  stages  of 
human-con^uter  interaction  but  all  the  techniqi:»s  were  weak  in  measuring  activities  involving 
display  perception,  interpretation,  and  evaluation,  and  how  similar  the  concepts  encoded  by  the 
computer  language  are  to  the  way  users  think  about  these  concepts.  The  conclusion  was  that 
improvements  to  existing  or  new  techniques  are  required  for  evaluating  the  direcmess  of 
engagements  for  graphical,  direct  manipulation  style  interfaces.  Phase  2  of  this  research  was 
documented  in  Volume  n  of  this  MSR  report  series  (MTR  92B0000047V2). 

Finally,  tiie  third  phase  of  the  research  involved  ctxiducting  a  user-based  evaluation  and 
develt^ing  new  measures  of  USI  effectiveness  for  analyzing  data  collected  during  such 
evaluations.  These  results  are  documented  in  this  repcxt.  Below  we  discuss  why  measures 
and  indicators  of  USI  effectiveness  are  needed,  and  we  introduce  the  concept  of  direct 
maiupulation  style  interfaces.  This  is  followed  by  a  review  of  Norman's  and  Hutchins,  Hollan 
and  Norman's  theories  on  stages  of  user  activity  and  the  concepts  of  semantic  and  articulamry 
distance. 
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Figure  1.  Overall  Research  Plan. 
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1.2  RATIONALE  FOR  COGNITIVE-BASED  USI  EFFECTIVENESS 
MEASURES 

Several  changes  in  the  area  of  computers  and  user-system  interface  design  bring  about  the 
need  for  cognitive-based  measures  of  USI  effectiveness.  Computers  are  increasingly  used  to 
siq)port  dynamic,  interactive  tasks  in  which  the  user's  mind  is  an  important  component  of  the 
total  system  (Landauer,  1987).  No  longer  are  users  using  a  computer  as  a  tool  to  perform 
independent  tasks,  by  giving  instructions  to  the  computer  and  waiting  for  a  reply.  Object- 
oriented  programming  allows  the  task  domain  world  to  be  graphically  present  by  the 
con^uter  for  direct  interaction  with  the  user.  As  mtne  and  more  applications  adopt  graphical 
interfaces  we  need  a  cognitive-based  method  of  evaluating  such  user  interfaces  to  measure  their 
effectiveness. 

A  primary  usability  feature  of  a  graphical  user  interface  is  the  directness  with  which  a  user 
can  manipulate  and  control  a  software  system.  Schneiderman  (1982, 1983,  in  Ziegler  et  al., 
1988)  characterizes  directness  as: 

•  Continuous  representation  of  the  object  of  interest 

•  Physical  actions  or  labeled  button  presses  instead  of  complex  syntax  and  command 
names 

•  Rapid  incremental  reversible  opmtions  whose  impacts  on  the  object  of  interest  are 
immediately  visible. 

These  types  of  interfaces  are  hypothesized  to  be  "easier  to  use"  than  dialogue-style 
interfaces  because  the  basic  functionality  can  be  quickly  learned  and  actions  are  immediately 
reversible.  This  does  not,  however,  explain  how  directness  of  a  user  interface  be  designed  and 
evaluated. 

The  concept  of  direct  manipulation  (DM)  is  relatively  complex  (Hutchins,  Hollan,  and 
Norman,  1986).  Hutchins  et  al.  note  that  despite  the  promise  of  this  ctmcept,  there  is  no 
account  of  how  particular  properties  might  produce  the  feeling  of  "directness."  Intuitively, 
directness  can  be  described  in  terms  of  the  type  and  number  of  mental  steps  required  by  the 
user  to  achieve  a  desired  goal.  For  example,  if  we  want  something  to  be  displaced  by  two 
inches,  we  move  it  two  inches;  our  actions  directly  mimic  our  intentions. 

The  high-level  issues  relating  to  directness  and  evaluating  directness  are: 

•  Does  the  computer  reinesent  the  task  domain  in  the  same  way  the  user 
conceptualizes  the  task  domain? 

•  Are  tile  objects  in  the  system  at  the  level  of  task  object  the  user  expects? 
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•  Is  the  USI  directly  supporting  the  user's  cognitive  processes  during  task  performance? 

•  Is  the  user  forced  to  relearn  how  to  do  a  task  to  suit  the  computer's  model  of  the  task 
domain? 

•  Does  the  USI  use  terminology  and  icons  unfamiliar  to  the  user? 

•  Can  poor/good  interactive  sequences  between  the  human  and  the  computer  be  identified? 

Norman  (1986)  provides  a  generic  model  or  framework  which  describes  the  mental  steps 
associated  with  the  execution  of  a  higher  level  goal.  This  framewoiic  and  its  related  ctxicepts, 
presented  below,  provide  a  means  of  discussing  and  structuring  the  issues  associated  with 
directness. 

1.3  STAGES  OF  USER  ACTIVITY  MODEL 

When  examining  human-computer  task  execution,  Noman  (1986)  discusses  the 
discrepancy  between  a  human's  psychologically  expressed  goals  and  die  physical  controls  and 
variables  of  a  conq>uter  system.  Ctace  a  human  has  a  goal  (a  state  a  person  wishes  to  achieve), 
the  human  must  translate  this  goal  into  the  desired  system  state,  determine  what  settings  will 
achieve  this  state,  and  then  determine  w^at  physical  manipulations  are  required.  Following  the 
execution  of  the  required  manipulation  and  system  response,  the  human  must  thoi  transform 
the  physical  variables  of  the  system  state  into  expressions  which  are  relevant  to  the 
psychological  variable  of  the  goals.  This  is  a  fe^back  lot^  where  the  results  of  one  activity 
are  used  to  direct  further  activities. 

This  theory  undeiiies  Norman's  stages  of  user  activity  model.  Norman  took  the  approach 
of  dividing  up  user  tasks  into  distinct  segments.  He  uses  the  metaphor  of  bridging  a  gulf  to 
describe  the  issue  of  dealing  with  the  psychologiod  and  physical  variables  in  computer  system 
design  and  evaluation.  The  "Gulf  of  ^ecution"  represents  the  gap  between  psychologic^ 
goals  and  the  physical  system  The  four  segments  which  bridge  the  "Gulf  of  Execution"  are: 
intention  ftmnation,  specification  of  an  action  sequence,  executing  an  action,  and  making 
contact  with  the  data  entry  mechanisms  of  the  user  interface  (figure  2). 

The  "Gulf  of  Evaluaticm"  represents  the  gap  between  the  physical  system  and  the  original 
psychological  goals  and  intentions.  The  four  segments  which  bridge  the  "Gulf  of  Evaluaticm" 
are:  user  interface  display  ouq)ut,  percq)tual  {xocessing  of  display  ouq)ut,  interpretation  of 
display  ouqrut,  and  evaluation  ex’  Ae  oonqrarison  of  the  system  state  as  represented  by  display 
ou^ut  to  the  original  goals  and  expectations  (figure  2).  Note  that  "display  output"  could  be 
auditory  or  even  tactile;  it  might  not  be  limited  to  visual  di^lay. 
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Figured  The  Stages  of  User  Activity 

From  User  Centered  System  Design:  New  Perspectives  on  Human-Computer  Interaction 
(p.42)  by  D.  A.  Norman  and  S.  W.  Draper,  1986,  Hillsdale,  New  Jersey:  Lawrence  Erlbaum 
Associates,  Inc.  Copyright  1986  by  Lawrence  Erlbaum  Associates.  Reprinted  by  permission. 

Fmming  an  intenticm  is  the  activity  that  specifies  the  meaning  of  the  input  expression  that  is 
to  satisfy  the  user's  goal.  The  action  specification  prescribes  the  form  of  an  input  expression 
having  the  desired  meaning.  These  two  activities  are  psychological  activities.  The  form  of  the 
input  expression  is  then  executed  by  the  user  on  the  computer  interface  and  the  form  of  the 
ou^ut  expression  appears  on  the  display,  to  be  perceived  by  the  user.  Interpretation 
determiiies  the  meaning  of  the  output  expression  from  the  fcnm  of  the  output  expression. 
EvaluatitH)  assesses  the  relationship  between  the  meaning  of  the  output  expression  and  the 
user's  goals  (Hutchins,  Hollan,  and  Norman,  1986).  The  last  two  stages  are  also 
psychological  activities. 

Norman  concedes  that  real  activity  does  not  progress  as  a  simple  sequence  of  stages. 

Stages  of  activity  sometimes  appear  out  of  order  and  some  stages  are  skipped  while  other 
stages  may  be  repeated.  The  stages  on  the  evaluation  side,  for  instance,  can  be  occurring 
almost  continuously  at  some  level  during  an  interaction  sequence. 
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1.4  CONCEPTS  OF  DISTANCE  AND  DIRECTNESS 


Hutchins  et  al.  (1986)  further  elabotited  Norman's  model  and  suggested  that  directness  can 
be  derived  fiiom  the  degree  of  mental  transformation  required  to  span  one's  thoughts  about  the 
task  and  the  physical  requirements  of  the  system,  and  the  qualitative  feeling  that  one  is  directly 
manipulating  the  objects  of  interest  in  the  task  domain.  They  termed  these  concepts  semantic 
and  articulatory  distances  of  execution  and  evaluation.  Below  we  describe  these  concepts, 
which  are  also  illustrated  in  figure  3. 

1.4.1  Semantic  Distance/Directness 

Semantic  distance  of  execution  involves  matching  the  level  of  description  required  by  the 
interface  language  to  the  level  at  which  the  person  thinks  of  the  task.  Distance  involves  how 
well  the  computer  language  encodes  ctxicepts  as  the  user  thinks  of  them:  can  a  concept  be 
expressed  directly  or  is  a  complicated  expression  required?  The  resulting  semantic  distance  can 
be  measured  by  how  much  of  the  requir^  structure  is  provided  by  the  system  and  how  much 
by  the  user,  the  mote  the  user  provides,  the  greater  the  distance  (Hutchins  et  al.,  1986).  As 
distance  increases,  directness  lessens.  Semantic  distance  is  related  to  the  "nouns"  and  "verbs" 
or  "objects"  and  "actions"  provided  by  a  computer  system.  Suppose,  for  example,  a  user  of  a 
scheduling  system  wishes  to  schedule  Crew  A  to  fly  a  mission.  If  the  system  only  provides 
individual  crew  members  as  the  objects  the  user  can  manipulate,  the  user  would  have  to 
identify  the  individual  members  of  crew  A  and  repeatedly  perform  the  action  which  schedules 
each  crew  member  for  a  mission,  resulting  in  a  conc^licat^  expression  to  achieve  a  simple 
concept  The  user  is  being  forced  to  work  at  a  lower  level  than  desired,  resulting  in  greater 
semantic  distance.  For  execution,  forming  an  intention  is  the  activity  that  spans  semantic 
distance.  Tl>e  intention  specifies  the  meaning  of  the  input  expression  that  is  to  satisfy  the 
user's  goal  or  subgoal. 

Semantic  distance  also  occurs  on  the  evaluation  side  of  the  interaction  cycle.  Here, 
semantic  distance  is  prc^xxtional  to  the  amount  of  processing  required  by  the  user  to  determine 
whether  the  goal  has  been  achieved.  If  the  terms  of  the  computer  output  do  not  match  those  of 
the  user's  intention  or  ouq)ut  is  lacking,  translation  or  inference  by  the  user  is  needed, 
increasing  the  semantic  distance  (Hutchins  et  al.,  1986).  Using  the  same  example  as  above,  if 
the  usor  was  allowed  to  schedule  "crew  A",  but  tiie  output  of  the  action  showed  only  individual 
crew  members  who  were  scheduled,  and  not  grouped  ot  identified  by  crew  name,  the  user 
might  be  uncertain  as  to  whether  his/her  intention  was  fulfilled.  Fot  evaluation,  the  stage  of 
evaluation  spans  semantic  distance. 

Instances  where  a  system  does  not  support  a  user  intention  or  evaluation,  even  indirectly, 
are  also  of  interest. 
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HgureS.  SemanticandAxticulatory  Distance  of  Executim  and  Evaluati(Mi 

From  User  Centered  System  Design:  New  Perspectives  on  Human-Computer  Interaction 
(p.l  1 1)  by  D.  A.  Norman  and  S.  W.  Dn^,  1986,  Hillsdale,  New  Jersey:  Lawrence  Erlbaum 
Associates,  Lie.  Cqiyiight  1986  by  Lawrence  Erlbaum  Associates.  Reprinted  by  pomission. 


1.4.2  Articulatory  Distance/Directness 

Whereas  semantic  distance  relates  to  relationships  between  users’  intentions  and  meanings 
of  expressions,  articulamry  distance  relates  to  relationships  between  the  meanings  of 
expiessimis  and  dieir  physical  form  (Hutchins  et  al.,  1986). 

On  the  execution  side,  the  form  may  be  keysm^es,  mouse  movements  and  clicks.  On  the 
evaluation  side,  the  fonn  may  be  a  string  of  characters,  an  icon  or  shape,  or  an  auditory  signal. 
The  idea  is  to  reduce  the  nuinber  of  artntraiy  relationships  between  the  physical  forms  and  the 
expressions'  meanings  (Hutchins  et  al.,  1986).  Using  the  example  above,  an  ardculatcny 
direct  execution  fOT  sch^uling  a  crew  would  be  to  "grab"  the  crew  icon  with  the  mouse  and 
place  it  on  a  gnqihically  presented  scheduling  board.  An  articulatory  direct  evaluatitm  would 
occur  if  tiiat  same  crew  icon  now  qipeared  mi  the  schedule  at  the  correct  time.  Forming  an 
action  qiedfication  is  the  activity  that  spans  articulatory  distance.  For  evaluation,  interpretation 
spans  tile  articulatory  distance. 
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The  provided  articulateness  of  a  system  is  closely  tied  to  the  technology  available.  Simple 
keyboard  and  small,  low  resolution  screens  limit  the  form  and  structure  of  the  input  and 
output  forms,  respectively.  A  mouse,  fw  example,  is  spatio-mimetic,  meaning  it  can  provide 
articulatory  direct  input  far  tasks  that  can  be  represented  spatially.  Pictographs  and  icons  are 
examples  of  output  forms  which  are  related  to  their  meanings  (Hutchins  et  al.,  1986). 

Norman  notes  that  with  practice  and  experience,  crossing  the  "gulfs"  can  become  easier  but 
this  does  not  mean  that  the  distances  have  been  reduced.  Instead,  the  distances  have  been 
bridged  by  the  users  ~  not  the  system.  This  implies  a  need  to  distinguish  between  a  feeling  of 
directness  which  cxigiiuites  from  close  semantic  coupling  between  intentions  and  the  interface 
language,  and  that  which  originates  from  practice.  We  believe  that  symptoms  resulting  from 
semantic  and  articulatory  distance  could  best  be  seen  during  the  learning  stages  of  HCI,  before 
the  distance/gulf  is  bridged  by  experience. 

1.5  GOALS  OF  THIS  RESEARCH  EFFORT 

We  have  discussed  high-level  concepts  that  are  important  to  the  usability  of  gn^hical  user 
interfaces.  In  order  to  effectively  evaluate  such  interfaces,  we  need  to  extract  information 
which  indicates  the  presence  of  the  various  types  of  distances.  Traditionally-used  human 
factCR^usability  measures  alone  are  not  sufficient  to  adequately  address  these  issues  or  deal 
with  gnq)hical  user  interfaces.  Traditional  measures  consist  of  high-level  summary  measures 
such  as  task  completion  time,  number  and  percent  of  errors,  command  frequency,  percent  of 
task  complete,  fr^uency  of  referencing  documentation,  etc.  which  are  global  and  do  not 
provide  enough  granularity  or  insight  into  mental  processes  to  identify  specific  USI  problems 
or  their  causes.  They  are  of  little  value  for  redesign  efforts. 

HCI  researchers  have  also  develcqied  a  low-level  class  of  measures  which  may  be  classified 
as  USI-specific,  task  independent  measures.  Examples  of  these  are  input  device  measures 
such  as  mouse  movement  distance,  screen-layout  measures  for  assessing  screen  complexity 
(Tullis,  1984),  legibility  measures  such  as  reading  rate,  and  measures  for  calculating  optimal 
depth  vs.  breadth  for  menu  hierarchies.  These  measures  focus  on  a  very  narrow  aspect  of  the 
user  interface  and  are  valuable  for  fine  tuning  but  not  overall  diagnostics. 

The  goal  of  this  project  was  to  develop  a  new  class  of  measures  and  a  corresponding 
methodology  which  assess  the  directness  of  the  engagements  supported  by  the  USI.  We  call 
this  class  "measures  of  the  directness  of  engagements"  (table  1).  These  measures  would  build 
on  the  concepts  of  distances  recently  discussed.  The  measures  would  be  cognitive  as  a  result 
of  the  mental  nature  of  the  tasks  the  interface  supports,  and  theory-based  which  helps  direct  the 
evaluation  metrics  and  keeps  them  generic. 
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Table  1.  Examples  of  Three  Classes  of  USI  Effectiveness  Measures 


_ Class _ 

Traditional  behavioral  perfomiance 
measures 


_ Concepts/Measures 

Correctness  of  decision 
Optimization  of  resources 
Task  time 
Number  of  errors 


Measures  of  the  directness  of  engagements  Semantic  distance  of  execution 

Semantic  distance  of  evaluadtm 
Articulatory  distance  of  executitm 
Articulattxy  distance  of  evaluatitxi 
Error  analysis  by  stages 

USI-specific,  task  independent  measures  Menu  design 

Mouse/keyboard  measures 
Di^lay  legibility 
Screen  con^lexity 


1.6  SUMMARY 

A  rationale  for  the  need  for  a  new  class  of  measures  of  user-system  interface  effectiveness 
was  presented.  To  effectively  evaluate  graphical,  direct-manipulation  style  interfaces, 
measures  which  are  both  cognitive  and  theory-based  are  required.  The  measures  need  to  reflect 
concepts  important  to  the  usability  of  diese  types  of  systems.  This  includes  directness  in  the 
areas  of  sonantic  and  articulatory  distance  of  execution  and  evaluation.  In  the  next  section,  we 
further  develop  these  concepts  by  deriving  operational  definitions  and  indicators  based  on  these 
concepts,  and  describe  a  methodology  and  data  collection  techniques  needed  to  support 
generation  of  the  developed  indicators. 
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SECTION  2 


A  METHOD  FOR  MEASURING  DIRECTNESS  OF  ENGAGEMENTS 


In  this  section,  a  metfiod  for  evaluating  graphical,  direct-manipulati(Mi  style  interfaces  is 
presented.  A  method  is  needed  which  describe  directness  in  enough  detail  to  provide 
indications  rai  how  to  enhance  the  design,  but  not  too  much  detail  to  loose  sight  of  the  overall 
q)plication.  We  took  the  inverse  24)pioach,  rather  than  attempting  to  measure  every 
oigagement  to  assess  its  directness,  we  attempted  to  develop  indicators  which  reflect 
indirectness  in  any  of  the  stages  of  user  activity. 

In  this  section,  we  discuss  our  derivation  of  indicators  of  semantic  and  articulatory 
indirectness  based  on  Norman's  model  and  Hutchinson's  et  al.  elaboration  (xi  the  model.  The 
concept  of  a  usability  study  and  data  collection  techniques  is  discussed  next,  followed  by  the 
development  of  a  new  encoding  scheme,  and  analysis  techniques  for  the  extraction  of  USI 
effectiveness  indicators  of  indirecmess.  The  following  section  will  give  an  example  of 
applying  the  method  to  evaluate  a  system. 

2.1  DERIVATION  OF  OPERATIONAL  DEFINITIONS  AND  INDICATORS 
FROM  THE  MODEL 

The  concepts  cm  directness  presented  in  secticm  1  are  a  useful  starting  point  for  assessing 
the  usability  erf  graphical  interfaces,  but  further  definition  of  the  ccmcepts  is  required  before 
tiiey  can  be  used  in  practice.  Our  ^t  decisiem  was  to  concentrate  on  detecting  when 
indirectness  or  large  distances  exist  in  an  interfiu%.  It  is  easier  and  more  useful  to  detect 
problems  in  directness  than  to  quantify  the  directness  of  every  HCI  engagement.  The  concepts 
we  are  trying  to  measure  are  somewhat  subjective  and  still  require  inteipretaticm  but  it  is 
possible  to  flag  and  identify  potential  problem  areas. 

We  derived  definitions  and  related  indicators  to  address  the  different  ccmcepts  of  directness. 
These  are  shown  below  in  Tables  2-4.  The  indicators  were  develc)ped  based  on  our 
expoiences  and  some  initial  pilot  testing.  For  semantic  and  articulatcny  indirectness  of 
execution  and  evaluation,  we  defined  potential  contributors  to  each  type  of  user  interface 
indirectness  (e.g.,  semantic  indirectness  exists  if ...),  and  for  each  cause  we  provided  an  initial 
list  of  observable  indicatorsAesulting  user  behaviors  which  would  emeur  if  the  problem  was 
present  (tables  2  and  3).  For  instance,  semantic  indirecmess  would  occur  if  the  user  interface 
had  a  missing  high-level  object  that  the  user  expected  to  be  there.  Indicators  that  this  problem 
was  present  would  be  repetitious  actiems  perfonned  on  lower-level  objects.  In  addition  to 
sonantic  and  articulatory  problems,  there  can  also  be  problems  which  occur  during  the  stages 
execution  and  perception.  For  instance,  if  a  display  output  is  difficult  to  perceive,  indicators 
would  be  the  user  performing  frequent  actions  to  improve  tire  perceptibility  of  the  display  or 
making  errors  in  perception  (table  4). 
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Tables  2-4  refer  to  expressions,  steps,  and  actions.  We  define  expression  as  the  total 
command  sequence  required  to  ful^l  an  intentitm.  A  step  is  a  logical  computer  grouping  of 
acticms.  Actions  are  the  lowest  level  of  input  and  their  definition  is  variable  depending  on  what 
you  want  to  learn.  For  example,  a  step  may  be  the  sequence  of  acticms  required  to  move  a 
word  in  a  document,  and  the  actions  may  include  selecting  the  word,  dragging  the  word,  and 
drqrping  the  word  in  a  new  location. 

Table  2.  Causes  of  Senuintic  Indirectness  of  Execution  and  Evaluation  and  the 

Corresponding  Observable  Indicators 

Semantic  indirectness  of 

_ execution  if: _ Indicator _ 

User  intention  not  supported  •  User  states  the  desire  to  perform  the  missing  function 

•  Attempting  to  execute  unsupported  function,  forced  to  abort 

Missing  high-level  object  •  Same  step  or  set  of  actions  repeated  on  lower-level  objects 

Complex  expression  required  •  Many  steps/actions  required  to  complete  intention 
to  accomplish  intention  •  Errors  in  stqr  order 

_ •  Inccwnplete/aborts  in  intentions _ 

Semantic  indirectness  of 

evaluation  if:  Indicator _ 

Extra  stqKs)  required  to  •  Number  and  purpose  of  steps  performed  (e.g.,  to  get 
perform  an  evaluation  information,  or  "check"  something) 

Difficult  or  impossible  to  •  Frequency  and  types  of  evaluaticm  errors 

perform  an  evduation _ •  Evaluaticm  ncH  niade _ 
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Table  3.  Causes  of  Articulatory  Indirectness  of  Execution  and  Evaluation 
and  the  Corresponding  Observable  Indicators 


Articulatory  indirectness 

of  execution  if: _ Indicator _ 

Complex  steps  •  Number  of  actions  needed  to  perfcxm  a  single  step 

•  Errors  in  performing  the  step 

-  sequence  of  actions  is  incorrect 

-  omit  action  in  step 

-  add  extra  action  to  step 
•Aborted  step 

Poor  match  between  single  •  False  actitxi  match 

action  meaning  and  its  form  « Difficulty  locating^dentifving  action _ 

Articulatory  indirectness 

of  evaluation  if: _ Indicator _ 

Con^lex  display  output  to  •  Steps  or  actions  needed  to  perform  a  complete  interpretation 
intet)^t  •  Error  and  frequency  of  errors  in  interpretation 

Poor  match  between  display  •  Frequency  and  types  of  enxn^  in  interpretatitm 
output  form  and  its  meaiting 


Table  4.  Other  Indicators  Reflecting  Manual  or  Perceptual  Difficulties 


_ Problem: _ Indicator _ 

POOT  perceptibility  of  •  Frequency  of  steps/actions  performed  to  improve 

information  on  the  display  perception 

•  Frequency  and  types  of  perceptual  errors 


Poor  manual  interactitxi  with 
the  system 


Ease  of  error  recovery 


•  Frequency  of  stq)s/acti(ms  performed  to  make  manual 
interactions  easier  to  acccxnplish 

•  Frequency  and  of  manual/executitxi  errors 

•  Was  error  recovered  firmn 

•  Time  between  error  and  error  recovery _ 
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2.2  DATA  COLLECTION  TECHNIQUES  AND  MANIPULATIONS  NEEDED 
TO  SUPPORT  THE  INDICATORS 

The  technique  best  suited  for  extracting  the  above  indicators  during  an  evaluation  is  a 
usability  study.  The  indicators  are  extremely  dependent  on  understanding  how  real  users  think 
about  tasks  and  the  ease  with  which  they  can  use  the  coiiq)uter  system.  Usability  studies  are  a 
form  of  user-based  evaluation  which  can  be  used  to  evaluate  user  interactions  with  a  computer 
system  (see  MTR's  92B(XXXX)47  vol.  1  and  2  for  discussions  of  other  techniques).  Usability 
studies  (figure  4)  involve  collecting  objective  and/or  subjective  data  on  users  interacting  with  a 
system  or  prototype.  Such  studies  are  designed  to  evaluate  the  quality  of  particular  products  or 
prototypes,  and  to  improve  and  perfect  product  design,  with  the  understanding  of  human 
behavior  a  useful  and  in^)ortant  by-product  of  the  evaluation  procedures  (Holleran,  1991). 


History  Files 

-manually  or 
automatically  collected 


Verbal  Protocols 


Questionnaire 

Data 
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Evaluate  USI 
Effectiveness 


Figure  4.  User-Based  Evaluation 

When  collecting  data  during  a  usability-style  USI  evaluation,  the  purpose  is  to  get  as 
complete  a  description  of  the  user's  interaction  with  the  system  as  possible.  A  complete 
description  of  what  was  done  and  as  much  information  on  why  it  was  done  is  desirable.  There 
are  essoitially  four  primary  data  collect!  i  means  for  human-computer  interaction: 

•  video  recordings  of  the  display  and  users'  gross  activities, 

•  a  history  log  of  keystroke  and  mouse  input  data,  which  is  often  time-stamped, 

•  droughts  voiced  aloud  by  the  subjects,  and 

•  questionnaires. 

None  of  diese  methods  alone,  however,  provides  enough  information  to  determine  the  degree 
to  which  the  user  interface  supports  dire^ess  of  engagements. 
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Video  recordings  of  the  display  show  us  the  results  of  the  user  inputs.  The  videotape 
record  of  a  user-system  interaction  is  a  valuable  technique  for  identi^ng  usability  problems. 
The  advantage  of  videouq)e  is  that  it  provides  a  con^lete,  continuous,  and  real  time  record  of 
the  behavior  of  both  the  user  and  the  system  (Prasse,  1990).  Videotape  preserves  the  ccHitent, 
sequence  and  timing  of  actions  that  occur  in  the  user-system  interaction.  These  records  can 
that  be  re-examined  during  the  data  analysis  phase  of  usability  testing. 

History  logs  of  keystroke  and  mouse  input  data  provide  us  with  a  clear  picture  of  what 
users  did  and  when  they  did  it  Software  can  be  augmented  to  capture  all  user  acticms 
including  keyboard  entries,  mouse  movements,  menu  selections,  button  presses  and  icon 
manipulations.  A  portion  of  a  sample  keystroke  file  is  shown  below. 


Line  Elapsed 


# 

001 

1 1:32:39 

000 

002 

11:32:41 

002 

003 

11:32:43 

002 

004 

11:32:45 

002 

005 

11:32:47 

002 

006 

11:32:52 

005 

007 

11:32:58 

006 

008 

11:32:59 

001 

009 

11:33:01 

002 

010 

11:33:02 

001 

User  input 

Pressed  Button  (xi  View  Buncm  in  Main  Menu  Bar 
Released  Button  (xi  Date  Button  in  View  Menu 
Pressed  Button  on  Cancel  Buttm  in  Time  Dialog 
Pressed  Button  (xi  View  Butttxi  in  Main  Menu  Bar 
Released  Button  on  Change  Layout  Buttm  in  View  Menu 
Pressed  Button  on  Undisplayed  SUA  list  in  General  Layout 
Dialog 

Pressed  Button  cm  Add  Button  in  General  Layout  Dialog 
Pressed  Button  on  Undisplayed  SUA  List  in  General  Layout 
Dialog 

Pressed  Button  (xi  Add  Button  in  General  Layout  Dialog 
Pressed  Button  on  OK  Button  in  General  Layout  Dialog 


The  history  files  provide  a  record  of  what  the  user  did.  It  alone  is  not  sufficient  to  extract 
all  the  measures  of  interest  because  you  can  not  assess  what  the  user  was  trying  to  do.  From 
die  excerpt  above,  for  instance,  we  can  see  what  the  user  did.  The  user  selected  a  menu 
command  which  opened  a  dialog  box  and  then  closed  the  dialog  box.  Without  knowing  what 
the  user  intended  to  do,  we  can  not  assess  if  this  sequence  of  steps  was  the  correct  one  to 
accomplish  Iict  intention.  If,  for  example,  a  user  cqrens  a  dialog  box  and  immediately  cancels 
out  of  it,  the  user  could  have  opened  it  and  realized  it  was  the  wrong  dialog  box,  or  the  user 
may  have  intentionally  opened  it  to  obtain  some  information  from  it  Witiiout  knowing  tiie 
intention,  it  can  not  be  determined  if  a  sequence  was  intentional  or  an  error.  In  addition  to  the 
ambiguity,  with  history  logs  alone,  information  on  the  users'  stages  of  perception, 
interpretation  and  evaluation  is  not  available. 


History  logs  or  files  are  useful,  however,  in  that  they  provide  a  detailed  description  of  all 
user  inputs  which  would  be  difficult  to  get  fiom  the  video  tape  alone.  Also,  there  are  some 
indicators  of  USI  problems  that  can  be  extracted  fiom  history  files.  For  instance,  repetitive 
sequences  ci  actions  could  be  an  indicator  that  the  user  is  not  able  to  apply  an  action  to  many 
objects  or  a  high-enough  level  object  at  cmce.  So  object-level  information,  or  the  need  for 
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grouping  functicms,  a  semantic  measure  of  execution,  could  be  extractable.  You  can  also 
extract  some  indicators  of  execute  problems  such  as  information  on  typos,  backspaces,  and 
fiequency  and  duratitm  of  dragging  of  items.  Some  articulatory  distance  indicators  are 
available  such  as  menu  search  activity  (e.g.,  menu,  menu,  menu)  which  implies  a  search  for  a 
command,  performing  actions  in  the  wrong  sequence,  omitting  actions  in  a  sequence  and 
similar  typ«  of  error.  Finally,  timing  between  input  event  informatitxi  is  available.  This  type 
of  information  is  not  useful  when  task  or  decisicm  times  are  not  critical  system  drivers.  Also, 
system  timing  may  be  misleading  when  the  system  being  evaluated  is  a  prototype. 

The  audio  recordings  of  users  asked  to  "think  aloud”  during  task  performance  provide  us 
with  some  understanding  of  why  users  did  what  they  did.  This  last  technique  brings  us 
closest  to  understanding  the  cognitive  aspects  of  HO  including  what  the  us^s  goals  and 
intentions  were,  and  their  assessment  of  the  result  of  their  actions. 

We  need  to  understand  the  user's  intendon  as  well  as  the  steps  used  to  obtain  the  intention 
in  order  to  assess  the  direcmess  and  determine  the  success  of  the  following  stages.  The 
information  collected  in  the  history  file  needs  to  be  integrated  with  a  descripticm  of  the  user’s 
intention  and  goals,  as  well  as  their  interpretation  and  evaluation  of  the  proceedings.  The  data 
is,  however,  collected  in  different  formats  and  is  typically  not  synchronized  in  time  across 
formats.  Thus,  integrating  and  summarizing  the  data  poses  a  challenge.  Below  is  an  exaiiq)le 
of  an  integrated  history/transciibed-protocols  data  file; 

"aUright  Okay,  so  I  want  to  see  that  week." 

001  11:32:39  000  Pressed  Button  cm  View  Buttcm  in  Main  Menu  Bar 
002  1 1 :32:41 002  Released  Buttcm  cm  Date  Button  in  View  Menu 
003  1 1:32:43  002  Pressed  Button  on  Cancel  Button  in  Hme  Dialog 
"Well,  I  probably  need  airspaces  up  there  first” 

004  11:32:45  002  Pressed  Button  on  View  Buticm  in  Main  Menu  Bar 
005  1 1:32:47  002  Released  Buttcm  cm  Change  Layout  Buttcm  in  View  Menu 
"Who  am  I  again?  Phoenix" 

006 1 1:32:52  005  Pressed  Buttcm  on  Undisplayed  SUA  List  in  General  Layout  Dialog 
"Ah,  Yankee  1." 

007  1 1:32:58  006  Pressed  Button  cm  Add  Button  in  General  Layout  Dialog 

008  11:32:59  001  Pressed  Buttcm  on  Undisplayed  SUA  List  in  General  Layout  Dialog 

"Ah,  Yankee  2." 

009  1 1:33K)1  002  Pressed  Button  cm  Add  Button  in  General  Layout  Dialog 
010  1 1:33K)2  001  Pressed  Buttcm  on  OK  Button  in  General  Layout  Dialog 

The  integrated  file  adds  a  wealth  of  information  to  the  history  file  and  a  previously 
ambiguous  event  now  becomes  clear  to  the  data  analyst  The  user  c^ned  a  dialog  box,  and 
dien  canceled  it  because  she  thought  she  needed  to  p^orm  another  step  first  The  user  started 
to  perfcHm  a  sequence  of  acticms  to  fulfill  an  intention,  made  an  evaluaticm  of  whether  this  was 
a  ootrect  sequence  of  steps,  decided  it  was  not  aborted  the  current  intention,  and  started  a  new 
sequence  of  actions.  This  turns  out  to  be  an  error  in  evaluaticm  because  the  originally  planned 
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sequence  of  steps  would  have  acccmiplished  the  intention.  The  conputer  design  was  not  at 
faidt  in  this  situation  because  the  action  would  have  been  disabled  if  it  could  not  have  been 
done  at  that  point  in  time.  Ccnnbining  the  protocols  and  the  histoiy  file  allows  a  more  complete 
description  of  the  human-computer  interaction  process.  We  can  now  determine  the  user 
intention,  identify  errors  in  the  stages  such  as  action  specification,  and  obtain  information  on 
the  two  evaluatitm  stages.  It  is  interesting  tc  note  that  what  we  have  classified  as  an  evaluation 
oror  would  not  have  typically  been  recorded  in  a  tradititmal  error  log  as  a  user  input  error  was 
not  made. 

With  the  data  in  the  current  form,  it  is  still  a  lot  work  to  extract  the  explanation  of  events  we 
just  described.  Frequency  of  errors  and  comparing  sequences  of  events  across  subjects  would 
be  hard  to  do  as  all  quantitative  measures  would  n^  to  be  calculated  manually.  What  is 
required  is  a  scheme  for  encoding  the  data  that  makes  the  scenario  we  just  described  very 
explicit.  Such  a  scheme  is  proposed  below. 

2.3  ENCODING  SCHEME 

A  key  contribution  of  this  research  effort  involved  the  development  of  an  encoding  scheme 
to  structure  the  data  collected  during  user-based  evaluations  in  a  way  which  permits  easy 
evaluation  of  the  directness  of  user-system  interactions.  After  many  iterations,  an  enco^g 
scheme  which  seemed  to  usefully  structure  the  data  collected  on  the  user-system  interaction 
process  was  developed;  it  is  loosely  based  on  Norman’s  stages  of  user-activity  model.  The 
scheme  is  hierarchical  in  nature  and,  for  the  execution  stages,  somewhat  similar  to  the  Goals, 
Operators,  Methods,  and  Selection  techruques  (GOMS)  approach  developed  by  Card  et  al., 
1983.  GOMS  is  a  hierarchical  method  of  analyzing  the  sequoice  of  activities  required  by  an 
interface  fOT  performing  various  tasks  with  the  system. 

Deciding  what  we  want  to  leam  from  our  usability  studies  was  the  first  step  in  the  encoding 
scheme's  derivation.  The  primary  purpose  of  a  usability  study  is  to  address  how  well  the 
computer  system  supports  the  ne^  of  a  user  during  task  performance.  So  far,  we  have 
discussed  measuring  the  directness  of  engagements-level  of  performance.  We  certainly  wanted 
to  aq)ture  informatiem  necessary  for  this  type  of  analysis.  As  was  discussed  in  section  1,  there 
are  also  some  higher  level,  more  traditional  measures  of  human  performance  which  address 
how  well  the  user  performed  the  task  overall.  These  include  measures  of  the  correctness  of 
decisions,  task  time,  task  completion  success,  quality  of  the  generated  ouq>ut,  etc.  Some  of 
these  measures  are  affected  by  the  computer  design  but  some  rest  with  the  task  skills  aixl 
experience  of  the  human.  For  example,  a  very  poor  computer  design  could  affect  ovoall  task 
time  and  product  quality.  However,  a  user  inexperienced  with  the  task  could  also  have  a  high 
ta^  time  and  poor  product  quality  even  if  using  a  well-designed  systeta  A  user  can  interact 
with  the  ccMuputer  perfectly,  in  terms  of  making  no  HQ  errors,  but  still  fail  to  achieve  task 
objectives.  Collecting  information  on  how  the  human  approaches  a  task,  their  problem  solving 
strategy,  their  skill  at  the  task  level  etc.  is  necessary  fOT  drawing  overall  conclusions  about  the 
amq>uter  system  effectiveness.  We  wanted  to  extract  these  measures,  in  addition  to  measures 
on  the  dire^ess  of  engagements,  from  our  encoded  data. 


17 


Using  the  stages  of  user  activity  model  as  a  basis,  our  first  idea  for  an  encoding  process 
was  to  use  every  user  intention  as  a  starting  point  and  supply  an  encoding  for  all  of  die 
fcdlowing  stages  fix>m  die  model.  This  brute  force  approach  quickly  runs  into  trouble  in  two 
ways.  Rist,  adding  the  perception,  interpretation,  and  evaluation  stage  to  every  user  intention 
gre^y  increases  the  amount  of  information,  and,  for  the  most  part,  these  stages  will  just  be 
coded  as  having  been  performed  correcdy.  Thus,  only  when  there  is  a  problem  in  these  latter 
stages  are  diey  of  real  interest  Also,  the  process  of  perception  is  continuous,  and  difficult  to 
code  as  a  discrete  action.  Only  an  error  in  perception  or  taking  extra  actirxis  to  inqirove 
percqition  is  observable.  The  second  problem  is  identifying  the  appropriate  level  of  intention 
to  encode. 

At  what  level  of  interaction  do  we  apply  the  cycle?  As  noted  by  Norman  (1986),  the  HQ 
process  can  be  brdcen  down  further  and  f^urther  until  the  level  of  a  user's  intention  would  be  to 
move  die  cursm*.  As  the  cursor  is  moving,  there  would  be  a  continuous  perceptual  activity. 

The  con^letion  of  the  cursor  movemoit  can  then  be  interpreted  and  evaluated.  When  working 
at  such  a  low  level,  a  single  cursor  movement  activity  is  broken  down  into  many  stages,  aixl 
actually  creates  more  data  with  very  litde  infcnmation  added.  Other  evaluation  techniques  such 
as  GOMS  (Card  et  aL,  1983)  and  the  cogrutive  walkthrough  technique  (Lewis  et  al.,  1990) 
have  the  same  {xobleriL  Activities  can  be  bndcen  down  lower  and  lower  until  the  information 
may  be  below  the  level  of  interest  The  best  level  to  work  at  depends  on  the  questimis  to  be 
answered  by  the  study.  For  our  purpose,  which  is  evaluating  new  prototyped  software 
systems,  we  are  not  interested  in  the  very  lowest  keystro]»  level  of  interacticm.  We  want  to 
focus  on  the  semantic  and  articulatory  levels  radiCT  than  on  the  detailed  execute  level.  With  that 
constraint  in  mind,  a  two-level  enco^g  scheme  was  created  -  one  level  focusing  on  the  user's 
cognitive  strategies  or  semantic  level,  and  the  second  on  the  articulatory/executitxi  level.  The 
second  level  was  inqiletnented  at  a  user-interface  object  level,  rather  thw  at  an  individual 
keystroke,  cutsot  movement  level. 

2.3.1  Semantic  Level 

When  trying  to  identify  the  stages  associated  with  human-computer  interaction  activities, 
die  first  four  stages  of  the  user  activity  model  (goal,  intention,  action  specification,  and 
execute)  are  easier  to  identify  than  the  three  stages  spanning  the  gulf  of  evaluation. 

Determining  the  user’s  goals  and  intoitions  is  relatively  strai^tforward,  particularly  if 
predetermined  tasks  ate  being  performed.  Information  (xi  action  specification  can  be  inferred 
fiom  die  observed  executed  actions,  although  the  mental  processes  involved  in  formulating  the 
sequence  of  actkms  are  not  observable.  Examining  sequential  data  records  reveals  infomation 
on  wdiedier  users  diink  they  have  accomplished  their  intentitxi,  the  evaluation  stage. 

Based  on  these  concqits,  and  the  mote  traditional  task  measures  of  interest,  the  following 
encodings  were  developed  at  the  semantic  level  of  human-computer  interaction.  Each  usct  goal 
correspmids  to  an  overall  objective,  probably  provided  by  the  conductors  of  the  usability 
study.  The  remaining  codes  allow  us  to  categorize  and  identify  the  user's  objectives. 
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\^thin  each  goal,  we  wanted  to  identify  the  task-space  intentions  the  users  have  to  accon^Iish 
the  goal,  dieir  intentions  to  execute  spe^c  ctxnputer  steps  to  accon^lish  a  single  task,  and 
intentions  they  form  to  improve  the  perceptibility  of  the  con^uter  workspace.  Each  of  these 
user  intentions  is  complete  with  a  cone^xmding  evaluation  code  that  contains  information  on 
die  success  cf  their  endeavors.  Also  included  in  this  semantic-encoding  level  are  errors  which 
are  made  in  any  of  the  stages  of  user  activity.  From  this  scheme,  information  on  the  user's 
goals,  intentions,  problem  solving  strategies,  computer  method  strategies,  goal  achievement, 
errors  in  each  stage  of  user  activity,  and  whether  or  not  an  error  was  recovered  fipom  can  be 
obtained.  Successfully  applying  the  codes  involves  extracting  and  analyzing  all  the  various 
types  of  data  collected.  E^h  c^e  is  defined  mme  fiilly  below. 

Goal  Achievement  to  be  obtained  by  the  user,  usually  predetermined  by  the 

ejqxmmenter.  The  items  to  be  accon^lished  in  a  task  scenario. 

Task  intention  The  intention  to  con^lete  one  task  contributing  to  the  completion  of  a  goal. 

This  is  still  in  the  user  task  space.  For  example,  a  goal  may  be  to  schedule  all 
requested  missions,  but  a  task  intention  may  be  to  schedule  a  particular 
mission. 

Intention  to  A  task  step,  to  acconqrlish  the  given  task  intention.  Each  task  intentirxi  is 
execute  performed  by  a  single  or  a  sequeiKe  of  intention  to  executes.  The  intention  to 

execute  is  the  description  of  the  step  the  user  wants  to  perform.  For  example, 
opening  a  dialog  box,  performing  some  functicxis  in  the  box,  and  closing  the 
box  often  characterized  a  single  task  step.  The  purpose  for  executing  that  step 
is  die  intention  to  execute.  One  or  more  such  steps  may  be  required  to 
perform  a  single  task  intention.  In  the  semantic  encoding  level,  the  intention 
to  execute  was  not  further  Inoken  down.  The  details  of  the  individual  actions 
in  the  stqi  are  provided  in  the  articulatory  level. 

Intention  to  Task  intentions  were  characterized  by  their  intent  to  move  users  closer  to  their 
improve  goal,  to  the  completion  of  a  particular  task.  However,  many  direct 

perception  manipulation  systems  are  graphical  in  nature  and  an  artifact  of  using  them  is 
that  the  user  sometimes  is  foi^  to  take  one  or  more  steps  to  iiiqnove  the 
perceptibility  of  the  work  area.  This  is  distinguishable  ^m  the  task  intention 
in  that  it  is  necessary  but  does  not  directly  move  the  users  closer  to  their  goal, 
nor  is  it  done  with  the  intent  to  complete  a  task.  It  is  on  the  same  level  as  task 
intentions  because  they  interrupt  the  completion  of  a  goal  and  have  their  own 
set  of  intentions  to  execute.  I^aUy,  it  is  an  artifact  that  the  system  design 
should  minimize  and  is  therefore  identified  separately  from  other  task 
intentions. 

Evaluation  Each  intention  fcH-  either  task,  execution,  or  perception  is  closed  off  widi  a 
corresponding  evaluate  state  which  encodes  our  interpretation  of  the  success 
of  the  endeavor.  This  state  can  be  cocfed  as; 
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"OK”,  meaning  proper  completion  of  the  corresponding  goal,  task,  or 
execute; 

"Inc",  meaning  the  user  has  not  fiilly  completed  the  corresponding  intention 
(e.g.,  changed  the  time  correctly  but  forgot  to  convert  it  to  EST  units); 

"abort",  meaning  the  user  has  abandoned  the  corresponding  goal,  task,  or 
execute;  or 

"wrong",  the  series  of  actions  chosen  to  accomplish  the  task  or  execute  were 
not  the  right  sequence  of  actions. 

Errors  detected  at  any  stage  were  included  in  the  semantic  level  encodings. 
They  are  defined  as  follows. 

Error  in  Occurs  when  the  user  is  making  the  classically-defined  "mistake".  The  intent 

intention  of  what  they  want  to  do  will  not  move  them  closer  to  their  goal,  even  if  they 

execute  the  intent  correctly.  This  type  of  error  is  often  independent  of  the 
user-interface,  and  may  be  the  res^t  of  misunderstanding  the  task,  or 
forgetting  some  of  the  task  dmils.  For  example,  if  a  user  was  asked  to  create 
a  schedule  fcnr  the  time  period  8-12  August,  and  they  instead  create  it  fw  the 
time  period  8-12  Sq)tember  because  they  thought  September  was  the  correa 
mcmA,  tiiey  have  made  an  error  in  intention.  Note  that  if  they  meant  to 
schedule  for  August  but  inad\^Ttently  selected  September  rather  than  August 
fixxn  a  list  of  months,  tiiis  would  be  an  execution  error,  and  if  they  failed  to 
notice  tile  error,  an  error  in  evaluation  as  well.  This  highlights  the  fact  that  the 
type  of  error  can  not  be  ascertained  from  the  history  log  of  user  inputs  alone; 
an  understanding  of  the  user’s  thought  process  is  idso  required. 

Error  inaction  Occurs  if  the  user  performs  an  action  or  series  of  actions  on  the 
spec^ation  computer  which  are  not  correct  to  accomplish  the  intended  step.  This  can 

range  from  having  to  search  throu^  mmus  to  locate  a  ccxnmand,  to  caning 
the  wrong  dialog  box,  to  omitting  actions  in  a  sequence  of  actions,  etc. 

Error  in  Occurs  if  the  manual  interaction  with  the  computer  is  not  as  intended. 

execution  These  types  of  errors  include  typos,  selecting  an  item  adjacent  to  the  intended 
item,  etc. 

Envr  in  Occurs  if  tiie  user  encounters  difficulty  thou^t  to  be  caused  by  a  break- 

perception  down  in  the  perceptual  processing  of  the  display.  For  instance,  the  user 

schedules  tiie  wrong  mission  because  the  mission  ictxis  were  small  and  close 
together,  or  the  status  indicators  were  too  small  to  be  read  correctly. 
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Error  in  Occurs  if  the  user  perceives  the  informaticm  correctly  but  fails  to 

interpretation  inteipret  the  system  state  correctly.  For  example,  if  a  mission  icon  turns  red 
and  ^e  user  deteraiines  this  to  mean  "scheduled  and  OK"  when  in  fact  it 
means  "scheduled  and  in  conflict  with  another  mission",  an  error  in 
interpretation  has  occurred. 

£m>r  in  Occurs  when  the  user  performs  the  evaluation  stage,  determining  whether  s/he 

evaluation  moved  closer  to  his/her  goal,  and  either  they  think  they  did  move  closer  when 
they  did  not,  or  vice  versa.  An  example  would  be  a  user  starts  to  perform  the 
correct  sequence  of  actions  in  an  execute,  thinks  he  needs  to  do  scmie  other 
action  first  to  achieve  his  goal  when  he  does  not,  and  abandons  his  set  of 
steps  to  do  an  unrequired  sequence  of  actions  first. 

Error  recovery  Occurs  when  a  user  notices  that  s/he  has  made  an  error  and  corrects  it. 

Usually,  typos  and  other  low-level  errors  are  detected  almost  immediately  and 
corrected.  If  an  erm'  is  not  noticed  at  the  time  the  user  leaves  the  task,  the 
task  is  usually  coded  as  incon^lete. 

The  encodings  are  summarized  in  table  S. 

The  coding  scheme  at  this  level  gives  information  on  the  user's  overall  strategy,  the  number 
and  types  of  steps  performed  to  carry  out  a  task  intention,  whether  the  step  was  p^ormed 
cottectly  at  the  computer  level,  whether  the  task  was  the  right  task  to  meet  the  g^,  whether  it 
was  carried  out  completely,  and  where  and  what  types  of  errors  were  made  at  each  stage  of 
user  activity,  as  well  as  whetiier  it  was  recovered  firom  or  not.  Infcxmation  on  the  actual 
sequence  commands  and  user  inputs  selected  are  not  included  at  this  level  but  at  the 
articulatocy  level.  We  believe  the  semantic  level  is  generic  enough  to  be  tqrplied  to  most 
qrplications  with  direct-manipulation  style  interfaces. 
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Table  5.  Semantic-Level  Encodings 


Encoding 

Definition 

Got! 

Scenario  step. 

Task  intention  (Inttask) 

An  intention  to  complete  one  task 
contributing  to  the  completion  of  a  goal. 

Perception  intenticm 
(Intper) 

An  intenticxi  to  improve  the  perceptibility  of 
a  display. 

Intention  to  execute 
(Intexe) 

One  con^uter  step  (may  be  comprised  of 
multiple  actions)  leading  to  the  ccnnpletion 
of  a  t^  intention.  Several  steps  may  be 
required  per  task  intention. 

Evaluate  (Eval) 

The  success  with  which  the  intention  was 
accomplished. 

Enor  in  intention  (Err.int) 

The  intentitm  was  incorrect  and  will  not 
acconqrlish  the  goal. 

Einir  in  actitxi  ^)ecification 
(Err.acsp) 

Wrcxrg  sequence  of  actions  m  accomplish 
the  intention  to  execute. 

Error  in  execution 
(Err.  exec) 

Manual,  motor  error  in  executing. 

Eiror  in  perception 
(Err.  per) 

Break-down  in  human  perceptual 
processing  of  information  on  a  display. 

Error  in  interpretation 
(Err.  inter) 

User  fails  to  interpret  system  state  correctly. 

Error  in  evaluation 
(Err.cval) 

User  mistakenly  thinks  has  or  has  not 
moved  closer  to  the  goal. 

Recovered  error  (Rec.eir) 

Etrcx'  was  detected  and  recovered  from. 
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2.3.2  Articulatory  Level 


The  articulatoiy  encoding  level  focuses  <»i  the  actual  sequences  of  commands  and  user 
inputs  required  to  perform  each  intendon  to  execute;  the  enc^ngs  are  hierarchical  and  the 
ardculatoiy  level  is  the  lowest  level.  Execudon  could  be  a  single  keystroke  or  a  mouse 
movement  series  of  acdcms,  depending  again  on  the  level  that  the  analyst  wishes  to  break  down 
the  data.  We  attempted  to  keep  the  recorded  acdon  at  the  user-interface  object  level.  Thus, 
acdons  encoded  were  the  use  or  selecdon  of:  menus,  commands,  buttons,  window 
manipuladons,  selecdon  fixxn  lists  of  items,  data-entry  field  acdons,  and  scroll  bar  acdons. 

These  are  generic,  user-interface  objects.  Selecting  a  command  could,  for  example,  result  in 
an  acdon  being  taken,  such  as  approving  a  mission,  cff  it  could  result  in  the  opening  of  a  dialog 
box.  For  each,  the  actual  menu,  command,  or  button  name  is  recorded.  The  manipuladon  of 
some  ^plicadon-speciric  objects  could  also  be  of  interest,  and  the  encodings  should  be  added  on 
an  individual  system  basis.  The  ardculatory  encodings  are  shown  in  table  6. 

Table  6.  Articulatory>Level  Encodings 


_ Encoding _ 

Menu 

Command 

List-Select 

Button 

Field 

Scroll 

Window 

Applicadon-specific  objects 


_ Definition _ 

A  menu  was  opened 
A  command  was  selected 
An  item  is  selected  from  a  list 
A  button  was  selected 
An  acdon  was  taken  in  a  field 
A  scroll  bar  acdon  was  performed 
A  window  acdcm  was  p^ormed 
Encodings  to  track  the  manipuladon  of 
applicadon-specific  objects 


An  example  is  given  below  in  figure  5  to  illustrate  the  hierarchical  nature  of  the  encodings 
fcH*  both  the  semandc  and  ardculatory  levels,  illustrating  the  encoding  reladonships. 

2.4  EXTRACTING  USI  INDICATORS  FROM  THE  ENCODED  DATA 

This  encoding  scheme  provides  a  generic  structure  to  be  applied  to  integrated  sequendal 
data.  Since  analysis  and  interpretadon  of  the  various  sequendal  data  streams  is  needed  to  apply 
the  encodings,  it  is  itself  a  form  of  analysis  and  it  provides  irKxe  infcxmadon  than  any  one  t^ 
Off  sequendal  data  alcme  or  the  uncoded  integrated  data  file  alcme.  Addidonally,  the  data  is  now 
in  a  form  amenable  to  quandtadve  analysis.  From  the  encoded  data,  much  informadon  (xi  the 
effecdveness  of  the  human-computer  interacdon  process  can  be  extracted. 
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Ck>al  1 


Task  intention  (task  A) 

Intention  to  execute  (step  1) 

Button  1 
Execute  error 
Button  2 

Recover  from  error 
Evaluate  step  1 

Intention  to  execute  (step  2) 

Menu 

Command 

Error  in  action  specificatitxi 

Menu 

Command 

Recover  from  error 

Button 

Evaluate  step  2 
Evaluate  task  A 

Perceptual  intention  (task  C) 

Intention  to  execute  (step  1) 

Hmebar 
Hmebar 
Evaluate  step  1 
Evaluate  perception 

Evaluate  Goal  1 

Figure  5.  Conceptualizatitxt  of  the  Encoding  Hierarchy 


At  the  highest  level,  the  users'  problem  solving  strategies  can  be  seen  as  well  as  the  order  in 
which  they  attempted  to  conduct  each  goal,  their  task  and  perceptual  intentions,  the  order  of 
their  intentions,  the  method  selected  to  accomplish  each  intoition,  the  success  of  the  various 
levels  of  endeavors,  whether  they  successfully  completed  each  goal  of  those  assigned  and 
where  and  what  types  of  errors  were  made.  At  tiie  lower,  articulatory  level,  the  actual  actions 
the  user  selected  to  accomplish  each  task  and  perceptual  intention,  the  number  of  actions  per 
intention,  the  order  in  which  they  were  taken,  whe^er  an  execute  error  was  made,  etc.  can  be 
seen. 
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The  indicators  of  USI  indirectness  (given  previously  in  tables  2-4,  section  1)  can  now  be 
extracted  fiom  the  encodings.  The  indicators  for  semantic  indirectness  and  how  they  can  be 
identified  are  discussed  below. 

Semantic  Indirectness  of  Execution 

•  User  intention  not  supported 

Semantic  indirectness  exists  if  a  user  has  an  intention  which  the  computer  does  not  support 
at  all.  If  the  user  knows  the  intention  is  not  supported  by  the  system,  s/he  may  voice  the  desire 
aloud.  This  information  can  be  extracted  fixxn  the  protocols.  Fot  instance,  a  user  may  say  "I 
wish  I  could  bring  up  all  parts  of  this  mission  on  the  screen  at  once."  There  is  no  fiinction 
which  supports  tltis  desire  but  the  request  for  it  is  now  known  and  can  be  evaluated  as  to 
whether  it  should  be  added  or  not. 

A  second  indicator  of  the  same  problem  is  aborting  a  particular  attempt  to  execute  an 
unsupported  intention;  in  this  case,  the  user  does  not  know  the  intention  is  not  supported.  This 
would  be  coded  as  a  task  intention,  for  example.  Task  Intention  (show  all  parts  of  mission  A), 
followed  by  an  intention  to  execute  and  a  string  of  actions.  The  user  will  not  be  able  to 
successfully  accomplish  his  intention,  so  the  corresponding  evaluates  will  be  coded  as  aborts. 

•  Missing  high-level  object 

A  case  of  semantic  indirectness  occurs  when  a  task  can  be  perfcmned  but  not  at  the  level 
desired.  This  results  in  tqrplying  an  actitm  to  a  series  of  lowo'-level  objects  repetitively.  For 
instance,  a  user  may  wish  to  change  all  four  parts  of  a  mission  from  9:(X)  to  8:(X)  at  once. 

Since  this  is  not  supported  with  the  system,  a  series  of  four  Task  Executes  applied  to  each  part 
of  the  mission  will  Iw  required  to  accomplish  the  intention  to  change  the  times  of  that  mission. 
Thus,  an  indicator  of  this  condition  is  that  the  same  stq)s  will  occur  repeatedly  to  accomplish  a 
single  task  intention.  Note  the  actual  actitxis  used  to  accon^lish  each  execution  may  or  may 
not  be  identical  as  there  may  be  alternative  methods  of  executing  a  time  change. 

•  Complex  expression  required 

Con^lexity  of  the  expressicm  required  to  accomplish  an  intention  is  a  cause  of  semantic 
indirectness.  many  stqjs  (intentions  to  execute)  are  required,  the  correct  steps  are  not 
obvious,  or  the  order  of  steps  is  not  immediately  apparent,  the  expression  could  be  labeled 
complex.  The  indicators  of  this  condition  are:  the  number  of  steps  per  task  or  perceptual 
intenticm,  steps  are  missing  or  extra  steps  are  performed,  and  steps  are  performed  in  the  wnxig 
sequoice.  Other  indicators  would  be  task  or  perceptual  intentions  are  sorted,  incomplete,  or 
wrcxig.  This  information  is  now  easily  extracted  from  the  encoded  data  since  we  know  what 
the  user  intention  was,  we  have  all  the  information  on  the  actual  sequence  of  steps  performed  to 
execute  the  intentitxi,  and  the  task  intention  evaluation  provides  data  on  the  success  of  the 
endeavor. 
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Semantic  Indirectness  of  Evaluation 

•  Extra  step(s)  required  to  perforin  an  evaluation 

There  are  two  indicators  that  semantic  indirecmess  of  evaluation  exists;  this  type  of 
indirectness  is  usually  attributable  to  lack  of  or  poor  feedback  in  a  system.  One  in^cator 
involves  having  to  p^orm  steps  (intentions  to  execute)  such  as  "check  whether  mission  A  is 
now  scheduled  for  S.'OO"  in  ad^tion  to  the  expected  executes  required  for  a  task.  This  indicates 
that  the  user  needed  additional  information  to  perform  the  evaluatitxi  of  whether  the  intention 
was  successfully  accomplished. 

•  Difficult  or  impossible  to  perform  an  evaluation 

Semantic  indirecmess  of  evaluation  also  occurs  if  it  is  very  difficult  or  impossible  for  the 
user  to  perform  an  evaluatitm.  For  very  poor  (tesigns,  a  user  may  be  totally  unable  to  ascertain 
whether  progress  was  made  toward  the  execution  of  an  intention.  This  ins^ility  to  perform  an 
evaluation  is  usually  due  to  a  lack  of  feedback  on  the  part  of  the  system.  Indicators  of  this 
condition  would  be  that  an  evaluation  is  not  made  (extract  from  the  protocols),  and  a  high 
frequency  of  evaluation  errors. 

There  are  also  causes  of  articulatory  indirecmess  of  execution.  This  is  affected  by  whether 
the  form  of  the  input  action  matches  the  intent  of  the  action.  Since  there  is  often  more  than  cme 
action  required  to  carry  out  a  single  step,  the  ease  of  identifying  the  correct  order  of  the  actions 
is  also  inqxxtant.  Indicators  of  articulatory  distance  are  discussed  below. 

Articulatory  Indirectness  of  Execution 

•  Complex  steps  required 

(Complexity  of  a  single  step  expression  is  a  cause  of  articulatory  indirecmess.  Indicates  of 
indirectn^  would  be  a  large  number  of  actions  per  intention  to  execute.  If,  for  example, 
changing  the  mission  time  of  a  single  part  of  Mission  A  from  9:(X)  to  8:00  required  12  actitms, 
this  would  be  an  indicator  of  indirecmess.  An  example  of  a  direct  execution  would  be  dragging 
die  mission  icon  displayed  on  a  schedule  frcrni  the  9:00  time  slot  to  the  8:(X)  time  slot  with  one 
action.  Obviously  there  is  some  subjectiveness  in  how  many  actions  are  too  many.  The 
frequency  with  which  each  of  the  steps  occurs  will  be  an  inqiortant  factor  in  determining 
indirectness.  Fiequendy  performed  steps  should  have  the  fewest  actions  for  prqier  execution 
(unless  the  step  fii^uency  is  due  to  a  semantic  distance  problem,  then  the  problem  should  be 
fixed  there).  Other  indicators  of  step  complexity  are  errors  such  as  performing  acticms  in  the 
incorrect  sequence,  omitting  actions  in  a  seqirence,  and  adding  extra,  unnecessary  actions.  The 
number  of  actions  per  intoition  to  execute  iridicatOT  is  easily  obtainable  ft>m  our  encoded  data. 
The  latter  would  be  coded  as  errors  in  action  specification. 


26 


•  Poor  match  between  a  single  action  meaning  and  its  form 

Another  cause  of  articulatory  indirectness  of  execution  is  a  poOT  match  between  the  form 
and  meaning  for  a  single  actitxi.  This  can  be  indicated  by  errors  in  action  specificatitm  of  the 
false  action  match  type  (Lewis  et  al.,  1990),  where  the  wrong  form  seems  to  match  the  user's 
intent  so  the  user  selects  the  wrong  action.  Another  indicator  of  a  poor  match  would  be  the 
inability  to  find  ot  identify  the  right  command  in  a  set  of  menus  or  if  the  user  has  difficulty  in 
locating  the  correct  acticm. 

Articulatory  Indirectness  of  Evaluation 

•  Complex  display  output 

A  con^lex  display  or  displays  with  a  poor  layout  are  causes  of  articulatory  indirectness  of 
evaluatiotL  An  inability  to  easily  interpret  the  system  state,  or  having  u>  perfram  many  actions 
or  absorb  a  lot  of  information  to  interpret  the  system  state  are  indicators  Aat  this  state  exists. 
This  can  be  determined  from  protocols,  and  the  frequency  of  errm  in  interpretation,  and 
possibly  from  the  sequence  of  actirms  performed  during  interpretation.  Interpretation  is 
difficult  to  assess  as  it  involves  the  transfer  of  information  fr^  the  display  to  the  user.  It  is 
difficult,  without  special  data  collection  equipment  such  as  an  eye-tracker,  to  assess  the  number 
of  pieces  of  information  used  or  the  order  in  which  information  was  assessed  for  an 
interpretation. 

•  Poor  match  between  display  output  form  and  its  meaning 

Anodier  cause  of  articulatory  indirectness  of  evaluation  is  a  poor  match  between  a  display 
ouqrut  fonn  and  its  meaning.  Ibis  will  result  in  errcnrs  in  interpretaticm,  and  actions  may  be 
taken  to  get  more  information  to  aid  in  the  interpretation.  For  exan^le,  having  to  refer 
finequently  to  a  legend  or  key  to  interpret  icon  meanings  is  an  indicator  that  they  are  not  directly 
coiveying  their  meaning. 

Indicators  reflecting  manual  or  perceptual  difficulty 

Indicators  of  perceptual  and  manual  difficulty  also  exist  They  are  described  below. 

•  Poor  manual  interaction  with  the  system 

The  ease  oi  physically  interacting  with  a  system  is  a  function  of  the  input  devices  and  dieir 
parameters,  the  fit  of  dte  input  device  curabilities  to  the  task,  the  numual  dexterity  of  the  user, 
the  limitations  of  the  display  real  estate,  etc.  The  best  indicator  of  manual  problems  of 
execution  are  fiequent  enors  in  execution.  Some  of  the  same  actions  which  are  taken  to 
in^nove  percefXion  (see  below)  also  improve  the  ease  of  the  manual  interacticm  because  objects 
are  made  largo*.  Therefore,  actions  taken  to  increase  the  size  of  objects  which  are  selectable 
could  be  an  indicator  of  manual  interaction  difficulties. 


27 


Poor  perceptibility  of  information  on  the  display 


The  frequency  of  intenticms  to  improve  percepdcm  or  the  need  to  adjust  the  workspace  are 
indicators  of  poor  user  perception.  living  to  improve  the  perceptibility  of  the  workspace  is  a 
function  of  the  task,  the  size  and  resolution  of  tlw  display,  the  design  of  the  user  interface,  the 
design  of  a  particular  display,  and  even  the  user  strategy  selected  to  perfcmn  a  task  with  a  given 
^stem.  It  may  be  unfeasible  to  totally  eliminate  this  Qrpe  of  function  with  currmt  technology 
but  computer  aids,  innovative  design,  or  encoura^ng  particular  strategies  may  help  to  minimize 
it  Oftoitimes,  there  is  a  trade-off  between  minimizing  this  type  of  action,  the  perceptibility  of 
informatitm  on  a  display,  and  the  amount  of  context  available.  When  viewing  large  amounts  of 
information,  context  is  increased,  movement  is  minimized  but  perceptibility  is  reduced.  Very 
frequent  intentions  to  improve  perception  indicate  there  may  be  a  problem. 

A  method  fcv  recording  these  indicators  is  presented  below. 

2,5  PROCESS  FOR  IDENTIFYING  AND  RECORDING  INDICATORS 

To  efficiently  qiply  this  USI  evaluation  method,  the  indicators  exutu;ted  from  the  encoded 
data  need  to  be  recced.  Additionally,  to  assess  whether  USI  design  changes  are  called  for, 
information  is  required  not  only  on  each  user's  performance  but  also  on  the  number  of  users 
who  e^qierioice  problems  in  the  same  USI  area,  and  the  similarity  of  their  problems.  To 
address  these  issues,  the  following  data  summarization  table  was  developed  This  table  serves 
several  purposes.  First,  it  aids  in  extracting  the  indicators  from  the  encodings  for  each  user, 
particularly  number  of  steps  per  task  intention,  number  of  actions  per  step,  reason  for  each 
intention,  and  the  frequency  and  type  of  task  intentions  and  intentions  to  execute.  It  also 
summarizes  the  number  and  types  of  errors  made  during  each  intention,  and  the  evaluation  of 
die  success  of  each  intention  is  readily  apparent. 


lnt.task 

Freq 

Int.exe 

(step) 

Freq 

#  actions 
per 

int.exec 

Evaiof 

int.exec 

Evaiof 

int.task 

Errors 

Comments 

The  tables  also  summarize  each  user's  performance  for  cmnparison  across  subjects.  For 
each  goal,  the  usability  study  conducts*  can  assess  whether  many  or  only  one  user  had 
difficulties  with  certain  goals,  and  whether  their  jntiblems  were  similar  or  different. 
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disadvantage  of  the  table  is  that  the  seqi^ncing  of  events  is  lost  This  information, 
however,  is  available  in  the  encoded  files.  The  comments  column  can  be  used  to  recod 
indicators  of  interest  which  are  not  inherent  in  the  fonnat  such  as  the  usa’'s  stated  desires 
obtained  frcnn  protocols,  user's  comments  on  the  USI,  more  complete  descriptions  of  errors, 
noting  of  low-level  repetitious  actions  in  the  data,  etc. 

Finally,  to  efficiently  apply  this  method  a  determination  needs  to  be  made  of  whether  an 
interface  design  change  is  required  to  fix  a  problem  or  whether  the  problem  is  due  to  user 
inexperience  with  the  system.  For  exan^le,  if  indicates  show  a  user  is  performing  a  task 
inefficiently,  it  could  either  be  due  to  indirecmess  in  the  interface  or  the  user  may  be  unaware  of 
a  more  efficient  method.  While  we  would  like  all  systems  to  be  user-centered  and  every 
interaction  to  be  intuitively  obvious  to  the  user  without  any  training,  this  is  not  likely  to  be  the 
case  for  the  near  future.  With  our  evaluaticm  method  the  ability  to  easily  compare  user's 
performance  at  an  interacticm  level  allows  assessments  of  wheAer  problems  are  specific  to  a 
single  user  or  are  occurring  across  users.  Each  situation  requires  interpretatitMi  on  the  part  of 
the  usability  analyst  User's  performance  changes,  as  they  learn  and  b^ome  more  experienced 
with  the  system,  can  also  be  tracked  with  this  method. 

2.6  SUMMARY 

Indicators  that  varimis  types  of  semantic  and  articulatory  indirectness  are  present  in  a 
system  were  derived.  By  coUecting  and  integrating  both  pr^ocols  and  history  files  a  fairly 
corr^lete  description  of  the  human-con^uter  interaction  process  is  provided  ^  identification 
ci  the  indicators  of  indirecmess  can  be  fiilly  supported.  To  enable  extraction  and  quantificatitm 
of  the  indicators,  an  encoding  scheme  was  developed.  Finally,  a  data  summarization  table  was 
develtqred  for  recording  metrics  fimn  the  encoded  data.  The  table  aUows  for  ease  of 
ocHnparison  of  indicators  across  users.  With  this  method,  we  feel  that  a  nxne  complete 
graphical  user  interface  evaluation  is  possible  than  with  any  of  the  existing,  traditionally  used 
evaluation  methods.  In  the  next  section,  we  describe  an  actual  usability  study  that  we 
conducted  and  apply  this  methodology  to  analyze  the  collected  data. 
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SECTION  3 


APPLICATION  OF  THE  EVALUATION  METHODOLOGY  TO  THE  MILITARY 

AIRSPACE  MANAGEMENT  SYSTEM 


In  diis  section,  die  evaluadcm  methodology  is  applied  to  the  data  collected  during  a  usability 
study  performed  on  the  Military  Airspace  Management  System  (MAMS)  prototype.  The  steps 
taken  to  format  the  data  for  qiplication  oi  the  encodings,  the  tool  used  to  aid  in  applying  the 
encoding  langmge,  a  summarization  of  the  USI  measures  of  effectiveness  extracted  from  the 
data,  and  USI  implicatkms  are  discussed. 

3.1  MILITARY  AIRSPACE  MANAGEMENT  SYSTEM  (MAMS)  USABILITY 
STUDY 

A  usability  study  was  conducted  on  a  MAMS  protoQrpe.  The  MAMS  displays  and  forms  are 
described  in  MTR  92B00(XX)47  vol.  2.  Only  the  main  MAMS  display  is  shown  here,  in  figure 
6.  The  MAMS  usabiliQr  study  was  selected  to  test  this  methodology  because  it  possesses  a 
graphical,  direct-manipulation  style  interface.  The  interface  was  in^lemented  with  Motif. 

3.1.1  Test  Participants 

Six  test  participants  were  used  in  the  MAMS  usability  study.  Four  test  participants  were 
from  the  Air  Force,  one  test  participant  was  frxnn  the  Navy,  and  one  test  participant  was  from  the 
Marines.  Hve  out  of  die  six  test  participants  had  previous  airspace  scheduling  experience  and 
four  out  of  die  six  test  participants  had  participated  in  the  MA^  users  group,  which  had 
contributed  to  the  systm  requirements.  Additionally,  the  MAMS  USI  designer  participated  in 
the  evaluation  to  {xovide  a  baseline  of  expen  performance  in  terms  of  user  interface  familiarity. 

3.1.2  Apparatus 

The  test  sessions  were  conducted  in  the  Usor-System  Interface  Technology  Testbed  (USITT) 
Evaluation  Facility  in  MITRE  M-Buikling.  The  MAMS  prototype  was  hosted  on  a  Sun 
w(»kstation.  The  strftware  was  specially  instrumented  to  collect  time-stamped  user  inputs.  A 
tripod-mounted  Panasonic  video  camera  with  time  and  date  staiiqis  was  used  to  videotape  the 
di^lay  fcMr  eadi  participant 

3.1.3  Procedure 

Test  participants  were  scheduled  for  a  one-day  test  session  beginning  with  a  90  minute 
training  session  and  a  pre-test  questionnaire.  Hie  training  sessirai  provided  detailed  informaticxi 
and  a  demonstratkm  trf  all  qiplicable  topics.  A  definition  of  a  "go^"  schedule  versus  a  "bad" 
schedule  for  die  purposes  of  the  usability  test  was  provided  The  test  sessicm  consisted  of  the 
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participants  coiiq>leting  the  tasks  outlined  in  the  test  scenario  (see  Appendix  A)  and  completing  a 
post-test  questionnaire.  To  keep  the  testing  period  reduced  to  a  reasonable  length  of  time,  only  a 
subset  of  the  total  MAMS  functionality  was  tested.  A  basic  scheduling  scenario  was  develcq)^ 
duu  incorporated  the  core  tasks  of  airspace  scheduling.  These  tasks  included:  approving 
mission  requests,  entering  missions  into  the  schedule,  resolving  mission  schedule  conflicts, 
finding  mission  data  in  the  schedule  and  generating  mission  reports.  Additionally,  test 
participants  were  asked  to  build  groupings  or  fold^  of  airspaces  to  be  used  to  facilitate  repeated 
data  entry  procedures.  Test  participants  were  encouraged  to  work  at  their  own  speed  and  were 
given  breaks  whenever  th^  requested  them.  A  pilot  study  had  been  previously  craiducted  to 
verify  the  trairting  and  test  prcx^ures. 

Test  participants  were  provided  with  copies  of  the  training  materials  and  a  MAMS  prototype 
users  guide  for  their  use  during  the  test  session.  Test  participants  were  additionally  provided 
with  a  quick  reference  tenqrlate  which  presented  information  on  keyboard  accelerators,  as  well  as 
examples  of  valid  date  and  time  entries.  Test  administrators  were  available  via  intercom  to  assist 
test  participants  when  they  requested  help. 

32  DATA  SELECTED  FOR  FURTHER  ANALYSIS 

During  some  of  the  test  sessions,  software  problems  were  encountered  and  the  system  failed; 
tins  is  to  be  expected  when  using  a  prototype.  In  some  instances,  the  keystrckes  before  the 
software  failure  were  lost  and  in  one  case,  Ae  system  failed  twice  and  all  of  the  data  in  between 
the  failures  was  lost  As  a  result  of  this,  and  because  of  time  ccmstraints  for  completing  this 
I^iase  of  the  research,  the  data  for  the  two  participants  with  the  data  loss  were  excluded  from 
extensive  analysis.  The  USI  design  expert's  data  was  analyzed,  to  provide  a  baseline  of  expert 
system  (not  ta^)  performance.  This  trude  a  total  of  five  data  files  fen-  analysis  (referred  to  as 
participants  1-S).  This  number  is  consistent  with  current  research  on  the  number  of  participants 
required  for  usability  testing.  Virzi  (1992)  has  found  that  80%  of  all  usability  problems  are 
detected  with  four  or  five  participants,  and  the  most  severe  problems  are  detected  in  the  first  few 
participants. 

3.3  DATA  nLTERING  AND  INTEGRATION 


The  first  step  in  the  data  analysis  process  was  to  get  the  data  in  a  form  amenable  to  applying 
the  encoding  scheme.  As  was  mentioned  earlier,  die  process  of  combining  history  file  data  with 
jxotocol  data  is  challenging  and  manual  in  nature.  An  excerpt  of  a  raw  histtxy  file,  collected  by 
instrumenting  the  MAMS  software,  is  shown  below. 


User  Action _ 

ButtonPress  Button  1 

Focusin 

Focusin 

Focusin 


Time  Object _ 

1 1 :32:39  viewCascadeButtwi  View 

1 1 :32:39  mainWindow 

11:32:39  mainWindowFom 

11:32:39  mainMenuBar 


33 


Focusin 

11:32:39 

mainWindow 

Focusin 

11:32:39 

menuShell2 

ButttxiRelease 

Button  1 

11:32:40 

changeDatePushBu  Set  Date  and 
Hmes. 

Focusin 

11:32:40 

MAMS 

Focusin 

11:32:41 

ScreenSetup_popu  p 

ButtonPress 

Buttonl 

11:32:43 

screenSetupCan^l 

ButtonRelease 

Button  1 

11:32:43 

screenSetiqiCancel 

The  raw  history  file  needed  to  be  filtered  to  reduce  the  size  of  the  file;  this  was  done  by 
removing  extraneous  data  such  as  key  releases  which  are  preceded  by  a  Cypress.  The  filter  also 
combined  typed  text  into  a  string,  and  summarized  dragging  actions  to  elin^te  the  description 
of  every  location  an  item  was  dragged  across.  The  filter  also  produced  elapsed  times  which 
reflect  the  users  delay  between  keystrokes  plus  the  computer  respcmse  time  firom  the  previous 
input 

The  instrumented  software  used  for  this  study  had  some  limitations  in  the  user  actions  that 
were  recorded  and  some  user  inputs  were  not  capuired.  Actions  not  recorded  included  the 
^)ecific  item  name  that  was  selected  within  a  list  the  characters  that  were  deleted  widiin  a  field, 
and  enlarging  or  reducing  dialog  boxes  to  mq)rove  visibility.  Double  and  triple  mouse  clicks 
were  recoded  as  single  clicks  and  the  video  tape  was  used  to  differentiate  between  diem.  Some 
inputs  were  ambiguous,  for  example,  "pressed  the  space  bar"  may  be  indicating  a  typed  space  or 
the  deletioi  of  the  contents  of  a  field.  Again,  the  video  tape  was  used  to  determine  the  purpose 
of  the  input 

After  filtering,  the  history  file  was  as  shown  below.  The  data  items  are:  line  number,  the 
time  of  execution,  elapsed  time  in  seconds,  and  the  filtered  user  input 

001  11:32:39  000  Pressed  Button  on  View  Buttcm  in  Main  Menu  Bar 

002  11:32:41 002  Released  Button  on  Date  Butmn  in  View  Menu 

003  1 1:32:43  002  Pressed  Button  on  Cancel  Button  in  Date  Dialog 

004  1 1:32:45  002  Pressed  Button  on  View  Bunon  in  Main  Menu  Bar 

005  1 1:32:47  002  Released  Button  on  Change  Layout  Button  in  View  Menu 

006  1 1:32:52  005  Pressed  Button  on  Undisplayed  SUA  List  in  General  Layout  Dialog 

007  1 1:32:58  006  Pressed  Button  on  Add  Button  in  General  Layout  Dialog 

008  1 1:32:59  001  Pressed  Button  on  Undisplayed  SUA  List  in  General  Layout  Dialog 

009  1 1:33K)1 002  Pressed  Button  on  Add  Button  in  General  Layout  Dialog 

010  1  l:33d)2  001  Pressed  Button  on  OK  Button  in  General  Layout  Dialog 

01 1  1 1:33.05  003  Pressed  Button  on  View  Button  in  Main  Menu  Bar 

012  1 1:33:10  005  Pressed  Button  on  Date  Held  in  Date  Dialog 

013  1 1:33:13  003  Typed  "133"  in  Stan  Date  Field  in  Date  Dialog 

014  1 1:33:39  026  Pressed  Button  on  Day  Field  in  Date  Dialog 

015  1 1 :33:40  001  Typed  "7"  in  Day  Field  in  Date  Dialog 

016  1 1:33:45  005  Prised  Button  on  OK  Button  in  Date  Dialog 
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To  integrate  the  history  file  with  the  "voiced-aloud  thoughts",  the  filtered  history  files  were 
printed  out,  the  video  u^s  were  viewed  and  the  verbal  protocols  were  manually  transcribed  onto 
paper  in  the  aiqm^niate  location  on  the  history  file.  The  problems  experienced  by  the 
participants  were  also  noted  and  recorded.  The  protocols  were  then  typed  into  the  filtered  history 
files  using  the  EMACS  editor  in  UNIX.  Below  is  an  example  of  the  integrated 
historyArwsaibed-protocols  data  file: 

"Alright  Okay,  so  1  want  to  see  that  week." 

001  11:32:39  000  Pressed  Button  (Ml  View  Buttm  in  Main  Menu  Bar 
002  11:32:41  002  Released  Button  on  Date  Button  in  View  Menu 
003  1 1:32:43  002  Pressed  Button  on  Cancel  Buttcm  in  Date  Dialog 
"Well,  I  probably  need  airspaces  up  there  first.” 

004  1 1:32:45  002  Pressed  Button  on  View  Button  in  Main  Menu  Bar 
005  1 1:32:47  002  Released  Buttrai  (m  Change  Layout  Buttrai  in  View  Menu 
"Who  am  I  again?  Phoenix" 

006  11:32:52  005  Pressed  Button  on  Undisplayed  SUA  List  in  General  Layout  Dialog 
"Ah,  Yankee  1." 

007  1 1:32:58  006  Pressed  Button  on  Add  Button  in  General  Layout  Dialog 

008  11:32:59  001  Pressed  Buttcm  on  Undisplayed  SUA  List  in  General  Layout  Dialog 

"Ah,  Yankee  2." 

009  1 1:33K)1 002  Pressed  Button  on  Add  Button  in  General  Layout  Dialog 
010  1  l:33d)2  001  Pressed  Button  on  OK  Button  in  General  Layout  Dialog 
01 1  1  l:33d)5  003  Pressed  Button  on  View  Button  in  Main  Menu  Bar 
"Okay,  now  I  want  to  see  my  dates.  Start  date  of.." 

012  1 1:33:10  005  Pressed  Button  on  Date  Field  in  Date  Dialog 
013  1 1:33:13  003  Typed  "133"  in  Start  Date  Held  in  Date  Dialog 
014  1 1:33:39  026  Pressed  Button  on  Day  Held  in  Date  Dialog 
Tor  7  days." 

015  1 1:33:40  001  Typed  "7"  in  Day  Field  in  Date  Dialog 

"F6r  duration  of  five  hours  I  will  keep,  cause  I  like  to  see  five  hours." 

016 1 1:33:45  005  Pressed  Buttcm  on  OK  Button  in  Date  Dialog 
"C^y,  oops,  forgot  to  change  the  month." 

3.4  APPLYING  THE  ENCODING  SCHEME  TO  THE  INTEGRATED  DATA 
FILES 

Working  from  hardccqiies  of  the  integrated  data  files,  the  video  was  viewed  a  seccmd  time 
and  the  encoding  scheme  was  applied  to  the  hardcopies  of  the  data  files.  The  integrated  files 
were  then  transferred  to  the  IBM  PC  from  the  Sun  workstation.  Next  the  enccxling  language 
was  applied  to  the  integrated  data  files  using  the  SHAPA  software  tcx>l. 
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3.4.1  Encoding  with  SHAPA 


SHAPA  (Software  for  Heuiistically  Aiding  Protocol  Analysis)  was  developed  at  the 
University  of  Illinois  at  Urbana-Qiampaign  Engineering  Psychology  Research  Laboratory 
(Sanderson  et  al.,  1989).  SHAPA  is  a  protocol  analysis  envinximent  where  researchers  can 
encode  data  with  any  encodings  they  choose.  Data  can  be  coded  in  a  variety  of  different  ways, 
depending  on  what  questions  need  to  be  answered,  and  analysis  can  occur  at  many  levels. 
SHAPA  suf^ioits  sequential  data  analysis  of  encoded  protocol  segments.  The  analysis 
techniques  it  supports  include  transition  matrix  analysis,  lag  sequential  analysis,  and  frequeiK;y 
cycles.  SHAPA  works  on  single-stream,  un-timestamped  verbal  and  non-verbal  protocols;  it 
runs  on  an  IBM  PC  or  compatible. 

To  encode  the  data  with  SHAPA,  predicates  (names  of  the  codes  to  be  applied)  are  specified. 
These  would  be  our  task  intention,  intention  to  execute,  etc.  as  defined  in  tables  5  and  6.  To  the 
articulatory-level  encodings,  we  added  two  applicadcm-speciEic  codes:  manipulations  of  the 
timebar  (which  affected  how  much  of  a  schedule  was  viewed),  and  manipulations  of  the  mission 
icons.  The  mission  icons  could  be  moved  by  dragging  with  the  mouse,  and  selected  by  clicking 
on  them.  Once  selected,  informaticxi  about  that  mission  appears  in  an  information  field,  and 
selected  commands  could  then  be  implied  to  it  (approving,  denying,  editing,  etc.). 

Each  predicate,  which  is  general,  can  also  have  a  user-deHned  value,  which  is  a  specific 
instattee  or  further  description  of  the  predicate.  Figure  7,  an  excerpt  of  the  semantic  level 
encodings  from  the  MAMS  study,  illustrates  many  of  these  concepts.  For  the  predicate 
INT.TASK,  1-1-setdate  was  the  value  descriptor.  1-1  means  this  is  the  user’s  first  task  intention 
for  goal  1.  Setdate  means  the  task  was  to  set  the  date  and  time  of  the  schedule.  The  SHAPA 
screen  looks  like  the  figure.  The  data  file  is  displayed  in  the  right  column  of  the  screen  and  the 
corresponding  encodings  are  typed  in  the  left  colunm.  SHAPA  will  prompt  for  correct  syntax 
based  on  the  user  pre-defined  predicate  and  value  list  The  encoded  ^es  could  then  be  printed 
out,  widi  an  option  to  also  print  the  protocols  and  keystrdtes  around  the  encodings.  An 
articulatory  level  encoding  is  shown  in  figure  8.  The  corrqrlete  set  of  predicates  and  values  used 
in  the  MA^  study  are  provided  in  Appendix  B. 
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GOAL(l-setdate) 

INT.TASK(l-l-sctdate) 
INT.EXEC(1- 1. 1-sctdate) 

ERILEVAL  (cl*l-l.l-scldatc- 
thought  needed  airspaces  on  display) 
EVALUATE(1  -1.1  -setdatc-abort) 

INT.EXEC(l-1.2-setlayout) 


EVALUATE(l-1.2-setlayout-ok) 


INT.EXEC(M.l-setdate) 


ERR.EXEC(e2*  1  - 1 . 1  -setdate-typo) 

ERRJ^CSP(e3*l-l.l-seidate- 

forgotsetmonth) 

EVALUATE  (1-1.1-setdate-inc) 


"AirighL  Okay,  so  I  want  to  see  that  week.” 

001  11:32:39  000  Pressed  Button  on  View  Button  in 
Main  Menu  Bar 

002  11:32:43  004  Pressed  Button  on  Cancel  Button  in 
Date  Dialog 

”WeIl,  I  probably  need  airspaces  iq>  there  fusL” 

003  11:32:45  002  Pressed  Button  on  View  Button  in 
^in  Menu  Bar 

004  1 1:32:47  002  Released  Button  on  Change  Layout 
Button  in  View  Menu 
"Who  am  I  again?  Phoenix" 

005  1 1:32:52  005  Pressed  Button  on  Undi^layed  SUA 
List  in  General  Layout  Dialog 
"Ah,  Yankee  1." 

006  1 1:32:58  006  Pressed  Button  on  Add  Button  in 
General  Layout  Dialog 

007  11:32:59  001  Prised  Button  on  Undisplayed  SUA 
List  in  General  Layout  Dialog 
"Ah,  Yankee  2." 

008  11:33.’01  002  Pressed  Button  on  Add  Button  in 
General  Layout  Dialog 

009  11:33:02  001  Pressed  Button  on  OK  Button  in 
General  Layout  Dialog 

010  11:33:05  003  Pressed  Button  on  View  Button  in 
Main  Menu  Bar 

"Okay,  now  I  want  to  see  my  dates.  Start  date  of..” 

01 1  1 1:33: 10  005  Pressed  Button  on  Date  Field  in 
Ditte  Dialog 

012  11:33:13  003  Typed  "133”  in  Start  Dale  Field  in 
CHue  Dialog 

0131 1:33:39  026  Pressed  Button  on  Day  Field  in  Date 

Dialog 

"For  7  days.” 

014  11:33:40  001  Typed  ”7”  in  Day  Field  in  Date 
Dialog 

"For  duration  of  five  hours  I  will  keep,  cause  I  like  to 
see  Hve  hours.” 

015  1 1:33:45  005  Pressed  Button  on  OK  Button  in 
Date  Dialog 

"Okay,  oops,  forgot  to  change  the  month.” 


Figure  7.  Application  of  Semantic  Level  Encodings 


I 


I 
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GOAL(l-setdate) 

INT.TASK(l-l-sctdate) 

INT.EXEC(l-l.l-sctdate) 

MENU(vicw,m) 

OOMMAND(date^) 

BUTTON(datc-canccl) 

EVALUATE(l-l.l-sctdkte-abort) 

INT.EXEC(  1-1 .2-setlayout) 
MENU(view,in) 

OOMMANDOayoutjn) 


LIST_SELECT(layout-undis) 
BUTTON  (layout-add) 


LIST_SELECJr(layout-undis) 

BUTTON(layout-add) 

BUTTONCayout-ok) 

EVALUAT^l-1.2-sctlayout-ok 

INT.EXEC(l-l.l-sctdatc) 

MENU(view,ni) 

CX)MMAND(date,in) 

FlliLD(date-date-edit) 


"Alright  Okay,  so  I  want  to  see  that  week." 

001  11:32:39  000  Pressed  Button  on  View 
Button  in  Main  Menu  Bar 
002  11:32:43  004  Pressed  Button  on  Cancel 
Button  in  Date  Dialog 

"Well.  I  probably  need  airspaces  up  there  first" 
003  11:32:45  001  Pressed  Button  on  View 
Button  in  Main  Menu  Bar 
004  11:32:47  002  Released  Button  on  Change 
Layout  Button  in  View  Menu 
am  I  again?  Phoenix" 

OOS  11:32:52  005  Pressed  Button  on 
Undisplayed  SUA  List  in  General  Layout  Dialog 
"Ah.  Yankee  1." 

006  11:32:58  006  Pressed  Button  on  Add 
Button  in  General  Layout  Dialog 
007  11:32:59  001  Pressed  Button  on 
Undisplayed  SUA  List  in  General  Layout  Dialog 
"Ah  VanVffi  2." 

008  11:33:01  002  Pressed  Button  on  Add 

Button  in  General  Layout  Dialog 

009  11:33:02  001  Pressed  Button  on  OK 

Button  in  General  Layout  Dialog 

010  11:33:05  003  Pressed  Button  on  View 

Button  in  Main  Menu  Bar 

"Okay,  now  I  want  to  see  my  dates.  Start  date 

of.." 

011  11:33:10  005  Pressed  Button  on  Date  Held 
in  Date  Dialog 

012  11:33:13  003  Typed  "133"  in  Start  Date 
Field  in  Date  Dialog 


FlELD(date-dutxlays-edit) 


BUTTON(datc-ok) 
EVALUATEd- 1 . 1-sctdate-inc) 


013  11:33:39  026  Pressed  Button  on  Day  Field 
in  Date  Dialog 
"For  7  days." 

014  11:33:40  001  Typed  "7"  in  Day  Field  in 
Date  Dialog 

"For  duration  of  five  hours  1  will  keep,  cause  1 
like  to  see  five  hours." 

OIS  1 1:33:45  OOS  Pressed  Button  on  OK 

Button  in  Date  Dialog 

"Okay,  oops,  f^got  to  change  the  month." 


HgureS.  Articulately  Level  Encoding  Example 
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3.4.2  Evaluation  of  the  Data  Transfornuition  Process 


The  tunes  required  fOT  the  data  transformation  process  are  summarized  below.  The  table 
shows  aiqnoximate  times  for  different  parts  of  the  data  transformatitm  process  for  each 
participant  The  times  for  participant  1  are  high  because  we  were  experimenting  with  techniques 
and  researching  various  encoding  schemes.  We  improved  the  techniques  further  after  analyzing 
the  data  for  participant  2.  The  difficulty  of  the  data  transformation  process  points  to  the  ne^  for 
a  specialized  tool  to  support  this  process,  which  is  currently  scheduled  to  be  developed  in 
FY'93.  This  is  discus^  further  in  section  4. 


Table  7.  Data  Transformation  Times 

(approximates  in  hours) 


Participant  # 

Transcribe 

Protocol 

Type  in  protocol 
split  the  files 

Manually  apply 
the  encodings 

Enter  data  into 
SHAPA 

Participant  1/ 

Technique 

experimentation 

42  (with 
keystrokes) 

12 

>90 

30 

2 

18 

18  (before  filter) 

42 

30 

3 

15 

3.5 

18 

24 

4 

15 

3.5 

15 

15 

5  (expert) 

No  protocols 

1.5 

6 

8 

3.5  TRADITIONAL  PERFORMANCE  MEASURE  RESULTS 

As  was  mentioned  earlier,  it  is  still  of  interest  to  track  the  traditional  performance  measures. 
The  total  time  used  by  each  participant  to  complete  the  test  scenario  is  shown  below  in  table  8. 
This  is  followed  by  the  completion  success  of  each  of  the  ten  goals  in  the  scenario,  table  9. 

As  expected,  while  we  learn  some  useful  information  about  the  system  from  these  global 
measures,  such  as  most  participants  had  trouble  successfully  completing  goal  5,  we  do  not  learn 
anything  about  the  type  of  difficulty  users  experienced  or  how  the  system  needs  to  be  improved. 
There  is  not  enough  granularity  in  ^is  type  of  data. 
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Table  8.  Traditional  Perfonnance  Measure  for  Each  Participant 


Partieiffamt  I 

Partieipant  2 

ParHeipant  3 

Participant  5 

Comp 

Comp 

Comp 

Comp 

Comp 

Inc.  (Did  not 

schedileW 

zones). 

Inc.  (Did  not 

scbec^W 

zones.) 

Inc.  O^not 
sdiediile 

124002S  or  any 
new  missions.) 

Inc.  (Due  to 
crash,  chose  not 
tore-schedule 
missions.) 

Comp 

Comp 

Comp 

Inc.  (Did  not 
press  create 
button.) 

Comp 

Comp 

Comp 

Comp 

Comp 

Comp 

Inc.  (Did  not 
press  create 
buttmi.) 

Inc.  (Did  not 
press  create 
button.) 

Comp 

Comp 

Comp 

Inc.  (Ftdders 
incmreafiom 
Goals  2, 3,  and 

4.  Wanted  to 
leave.) 

Comp 

Comp 

Comp 

Comp 

Comp 

Comp 

Comp 

Comp 

Comp 

Comp 

Comp 

Comp 

Comp 

Comp 

Comp 

Comp 

Comp 

Inc.  (Forgot 
Canyon  Run. 

Did  not  press 
print  button  for 
r54). 

Wrong  (Ptinied 
NqKune  instead 
of  Phoenix.) 

Inc.  (Told  not 
to  print  r54.) 

Inc.  (Did  not 
press  print 
button  for  r54.) 

Comp 

2.5  hrs 

4.0  hrs 

3.0  hrs 

2.5  hrs 

1.25  hrs 

3.6  EXTRACTING  THE  USI  INDICATORS  AND  OTHER  PERFORMANCE 
MEASURES 


From  the  encoded  data  files,  measures  and  indicators  of  USI  effectiveness  can  be  extracted 
and  summarized.  We  have  only  experimented  with  a  few  of  the  potential  analysis  techniques  that 
could  be  {q>plied  to  the  encoded  data.  Our  quest  is  to  find  the  most  useful  analysis  techniques 
and  automate  the  applicatitm  of  the  techniques  as  much  as  possible. 

3.6.1  Measures  Extractable  from  the  Summary  Data  Tables 

The  summary  data  table  was  con^leted  for  each  participant's  data.  This  was  performed 
manually  as  no  tool  supports  the  extraction  of  number  of  actions  per  step  and  number  of  steps 
per  task,  etc.  firom  data.  SHAPA  does,  however,  do  frequency  counts  so  the  frequency  wiA 
which  the  predicates,  e.g.,  task  intentiems  and  intentions  to  execute,  occur  were  generated  as  a 
check  that  none  were  missed.  The  hierarchical  emxxling  structure  facilitated  the  extractiem  of 
these  measures.  The  full  data  summary  table  for  participant  2  is  given  in  Appendix  C  (others  can 
be  obtained  u{Km  request),  and  a  san^le  excerpt  frexn  participant  1  is  included  in  figure  9.  It  is 
interesting  to  note  that  examining  the  user-interface  expert's  data  provides  us  with  information  on 
the  best  the  system  can  do.  The  users'  performance  provides  us  with  information  on  more 
realistic  intentions,  and  whether  they  found  the  syston  implements  functions  in  a  direct  and  easy- 
to-use  fashion.  If  a  problem  can  be  identified  from  the  interface  expert's  data  it  should  be  fixed 
as  it  will  definitely  i^ect  all  the  users. 

There  is  a  data  summary  table  for  each  goal,  which  corresponds  to  one  task  fixm  the  task 
scenarios  provided  to  the  users  as  part  of  the  usability  test.  The  first  column,  with  heading 
inttask,  contains  a  list  of  all  the  user’s  task  intentions  while  perforating  the  goal.  The  second 
column  contains  information  cm  the  number  of  times  each  ta^  intentiem  was  performed.  The  third 
coluirui  contains  information  on  the  intentiems  to  execute  for  each  task  intention  (the  computer  step 
or  method  used  to  accomplish  the  task),  followed  in  the  fourth  column  by  the  number  of  times 
each  step  was  performed.  The  fifth  column  contains  informatiem  on  the  number  of  computer 
actions,  at  a  user-interface  object  level,  that  were  performed  to  accon^lish  each  intention  to 
execute.  This  is  followed  by  the  evaluation  of  the  success  of  each  of  the  intentions  to  execute  in 
the  sixth  column.  The  column  labeled  evaluation  of  task  intention  (eval  of  int.task)  provides 
information  on  the  evaluation  of  the  success  of  each  of  the  task  intentions.  This  is  followed  by  a 
listing  of  errors  and  their  type  which  occurred  during  the  task  followed  by  the  usability  analyst's 
comments.  The  comments  could  include  information  on  the  causes  of  errors,  more  details  on 
errors,  interesting  user  comments  made  during  a  task  (particularly  when  stating  desires  for  missing 
functions),  the  noting  of  trends  in  the  data,  reasons  for  incompletes  on  tasks,  etc. 

Interpreting  the  data  requries  familiarity  with  the  goals,  tasks,  and  system  implementation.  A 
person  with  that  knowledge  can  see  fixxn  the  summary  data  tables  the  ease  with  which  each  part 
of  the  activity  was  convicted.  In  figure  9  this  user  had  the  intention  to  set  the  time  and  date 
"setdate"  three  different  times  which  is  indicated  by  the  3  in  the  secemd  colunui  of  the  table.  The 
first  time  aided  in  an  abort  due  to  the  evaluation  erm;  the  second  time  ended  with  an  incomplete 


due  to  the  failure  to  put  die  time  units  in  EST  time;  and  finally  the  task  was  completed  correctly, 
y^thin  the  diree  occurences  of  this  task,  the  user  had  five  intentions  to  execute.  On  at  least  one 
occuirence  of  the  task  intention,  two  or  more  executes  were  perfomied  ccmsecutively  (indicated 
by  the  S  intexecs  for  only  3  inttasks).  We  can  deduce  from  the  variety  and  number  of  errors 
encountered  during  the  intention  to  set  the  time  and  date  that  participant  1  had  some  minor 
difficulties  in  performing  this  intention.  The  problems  included  five  errors;  the  frrst  error  (el) 
was  an  error  in  evaluation  which  caused  the  user  to  abandm  a  coirect  sequence  of  steps,  as  she 
thought  a  different  task  needed  to  precede  this  one.  There  was  also  an  error  in  intention  (e5), 
where  die  user  forgot  to  perform  a  part  of  the  task;  when  changing  the  time,  she  neglected  to  put 
the  time  in  EST  units  as  specified  ^  the  task  scenario.  She  also  had  one  error  in  actitm 
^ledfication  (e3)  as  she  had  intended  to  change  die  nxmth  but  omitted  this  action.  The  other 
errors  were  all  minOT  execute  errors;  the  user  recovered  frran  all  of  the  errors  which  is  indicated 
by  the  "R"  following  each  error  code. 


Goal  1 


Int.task 

Freq 

fnt.exec 

Freq 

#  actions 
per  int.exec 

Eval  of 
int.exec 

Eval  of 
int.task 

Comments 

1-1  Mtdata 

3 

setdate 

5 

3 

8.3.7 

7 

abort 

me.  inc.  in 
Ok 

abort 

line 

OK 

Bl  -  err.eval  -  R 
a2  •  typo  -  err.exec 
fl 

b3  -  forgot  to  set 
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Figure  9.  San^ile  from  the  Summary  Data  Tables  of  Participant  1  for  the  Goal 

"Set  Time  and  Date." 


On  this  task,  the  most  serious  problems  are  die  evaluate  errOT  and  the  intention  eiTOTS.  Why 
did  the  user  think  she  needed  airspaces  on  the  screen  before  she  could  set  the  time  and  date?  Did 
the  user  just  forget  to  put  the  units  in  EST  time  and  forget  to  change  the  mcmth  field  ot  did  the 
nterface  design  contribute  to  this  omission  in  some  way  (e.g.,  by  having  old  default  data  filled  in 
the  fields,  not  making  it  iqiparent  to  the  user  that  they  (hd  not  alter  these  fields)?  Cbuld  the 
interface  have  prompted  Ae  user  in  some  way,  ot  m^e  the  current  units  more  obvious? 

After  evaluating  user  problems  for  each  participant  on  goal  1,  the  next  step  would  be  to 
compare  performance  on  goal  1  across  users.  As  a  baseline,  we  first  check  the  user  interface 
expert's  performance.  The  interface  expert  took  7  actions  to  perfcxm  this  task,  with  a  frequency 
of  one.  A  scan  of  the  other  users  shows  that  three  others  nearly  matched  that  perfcxmance,  and 
only  1  other  user  had  some  errors  in  actitxi  specification.  Bas^  on  these  results,  we  would 
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itcommend  checking  whether  the  interface  contributed  to  participant  I's  difficulties,  but  overall 
we  would  conclude  the  task  was  supported  reasonably  well  by  the  interface.  [Note:  we  later  saw 
diat  this  same  task,  when  performed  during  other  goals,  did  cause  users  difficulties  and  changes 
were  recommended  This  reinforces  the  need  for  realistic  task  scenarios  to  test  the  USL]  As  the 
user  performance  for  each  goal  is  evaluated,  a  sheet  should  be  kept  rating  each  area,  as  well  as 
noting  USI  problems  to  be  considered  for  redesign. 

As  was  menticmed  previously,  when  assessing  how  many  actions  are  too  many,  both 
frequency  of  the  action  needs  to  be  amsidered  as  well  as  the  task  itself.  If  the  task  involves 
editing  d^  in  three  data  fields,  you  would  expect  a  minimum  of  three  actions  so  4  or  5  acticms 
would  be  reasonable.  If  the  task  is  to  change  one  data  field  and  this  takes  4  or  5  actions,  it  is 
cause  for  concern. 

3.6.2  Summary  Data  Table  Indicators  for  Goal  2,  Scheduling  Missions 

Next,  we  will  examine  the  users'  performance  fix’  scheduling  missions  in  goal  2.  Figures  10 
and  1 1  show  excerpts  firom  the  summary  tables  for  the  expert  and  a  user,  respectively. 
Scheduling  involved  reviewing  the  requests,  approving  or  disapproving  them,  and  resolving  any 
conflicts  that  migh  have  resulted.  In  order  to  evaluate  how  well  the  system  supports  the 
scheduling  task,  we  need  to  understand  the  task.  The  task  was  to  schedule  missions,  some  of 
which  consist  of  multiple  parts  (figure  6  of  the  MAMS  display  shows  missions  connected  with 
lines  which  indicate  multiple  part  missions).  All  total,  there  were  71  mission  parts  to  be 
scheduled  for  goal  2. 

With  the  MAMS  prototype,  there  are  two  methods  for  approving  requested  missions.  One  is 
to  select  the  requested  mission  icon  with  the  mouse,  and  select  the  ccmunand  approve  from  the 
schedule  menu.  The  other  is  to  select  the  requested  missicm  icon  and  press  'control  a',  the 
command  by-pass  equivalent  to  the  menu  command.  The  first  method  is  counted  as  3  actions 
(select  icon,  select  menu,  select  command),  while  the  second  method  is  counted  as  2  actions 
(select  icon,  press  the  keys). 

What  indicators  exist  oi  indirectness  from  the  user  interface  expert's  performance?  The 
expert  averaged  2.1  actions/mission  part.  Two  indicators  of  a  problem  would  be  two  different 
levels  of  repetition  within  the  task  intention  column  and  within  the  execute  intention  column. 

The  usm  task  is  to  schedule  all  the  airspace  requests.  After  the  user  has  approved  the  requests, 
the  system  will  inform  the  users  if  conflicts  for  airspace  exist.  The  user  then  resolves  the 
conflicts.  A  sensible  approach  to  this  task,  therefore,  is  to  approve  all  the  requests  for  airspace, 
and  then  resolve  the  identified  conflicts.  As  can  be  seen  in  column  (xie,  approving  multiple 
mission  requests  is  not  possible.  If  it  were,  there  would  be  a  single  "sch"  task,  preceded  by 
selection  of  the  missions  to  be  scheduled.  Instead,  we  see  "sch"  tasks  for  each  mission.  The 
user  is  forced  to  work  at  a  lower  level  and  form  an  intention  to  schedule  each  mission  request 
individually. 
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Figure  10.  The  User  Interface  Expert's  Sumrnary  Data  Table  for  the  Goal  "Schedule  Missions". 


The  mission  show  in  line  2-4  is  an  exatiqile  of  a  multiple  part  mission.  As  seen  in  column  4, 
the  intention  to  execute  this  task  was  repeated  three  times,  indicating  that  diere  was  no  way  to 
apply  die  schedule  iq)i»ove  command  to  multiple  parts  of  a  single  mission.  The  user  is  again 
forced  to  work  at  a  lower  level  than  desired.  This  exanqple  differs  from  the  first  in  that  in  the 
first  case  we  were  not  missing  a  possiUe  higher-level  object;  in  the  Erst  case  we  wanted  to 
tempOTarily  group  objects  fm*  the  sole  purpose  of  implying  a  command  once.  Here,  its 
conceivabte  that  die  total  mission  with  all  its  parts  should  be  an  object  of  its  own.  fact,  in 
some  cases  the  system  does  consider  the  multiple  parts  as  a  single  object,  such  as  for  dragging 
on  the  display.  A  conclusion  we  can  draw  so  far  is  that  the  application  of  the  scheduling 
commands  such  as  approve  should  be  reconsidered  as  to  the  level  at  which  they  can  be  tqiplied. 

To  determiiie  exacdy  how  many  times  this  Int  exec  ->  Ttqie  ->  Menu  ->  Command  -> 
Evaluate  ->  Evaluate  ->  Intexec  was  executed,  we  ran  a  frequency  of  predicate  cycles  on  the 
expert's  data.  This  counts  the  number  of  times  the  same  sequence  of  events  occurs.  The  results 
revealed  this  cycle  occurred  62  times.  Unfortunately  this  analysis  routine,  as  discussed  later,  can 
(Mily  be  tun  at  the  predicate  level.  From  examining  the  data,  however,  we  know  that  this  cycle  is 
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the  approve  missian  request  cycle  of  actions.  Rtxn  this  data,  we  would  assume  that  the  system 
does  not  support  selecting  multiple  missions  to  which  a  single  scheduling  command  can  then  be 
applied;  this  is  the  case. 

Figure  11  shows  a  real  users  summary  data  tabte  for  this  same  task.  We  first  notice  the  same 
types  of  proUems  that  the  expert  had.  For  multiple  part  missions  such  as  rS4-w  which  has  two 
p^  (inttask  2-32),  the  nun^  actions  increases  ~  scheduling  all  of  r54-w  takes  4  actions 
inqrlying  the  system  does  not  consider  die  various  parts  of  r54-w  to  be  a  single  object  This 
results  in  repetitive  sequences  of  action  on  the  user’s  part 


Figure  1 1.  A  User's  Sununary  Data  Tables  for  Goal  2  "Schedule  Missions". 

A  new  pattern  is  also  apparent  in  the  user’s  data  shown  in  figure  1 1:  a  repetitive  sequence 
occurs  at  the  intention  to  execute  level.  Before  approving  a  mission,  the  user  peforms  an 
ocecution  to  "lo<dc"  at  the  mission  description;  titis  comlnnation  is  represented,  fcx  exan^le,  in 
task  2-27  as  lo(dtl240026-w,  followed  by  schl240(^6-w.  A  user  can  look  at  information  about 
a  mission  in  two  ways.  When  a  mission  is  selected  on  die  display,  some  information  about  the 
mission  iqppears  in  die  documentation  line  at  the  bottom  of  the  display.  M(xe  complete 
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information  on  a  mission  can  be  obtained  by  selecting  a  mission  and  opening  the  edit  dialog  box. 
This  can  be  performed  by  douUe  clicking  on  the  misaon.  This  user  is  finding  it  necessary  to 
open  die  edit  box  via  a  d^le  click  to  obtain  infcmnation  about  die  mission  before  approving  it; 
we  know  this  because  the  kxdc  tO(^  2  actions,  a  douUe  click  tm  the  mission  icon  foUowed  by  the 
close  button.  Checking  across  other  users'  data,  we  see  the  same  occurrence  -  users  feel  a  need 
to  look  at  the  mission  information  contained  in  Ae  dialog  box  befine  approving  the  mission. 

This  implies  that  these  two  steps  should  be  combined  in  some  way,  to  r^uce  the  number  of 
actions  and  to  make  it  more  direct  Possible  sdutions  include  identifying  the  key  information 
fiom  the  edit  box  and  putting  it  in  die  documentation  line,  or,  if  the  information  is  varied  or  too 
kmg,  putting  an  iqiprove  bu^  in  the  edit  dialog  box  (which  also  closes  the  box),  so  die 
ccxninand  can  be  done  immediately  after  looking,  and  shorten  die  number  of  steps.  These 
indicators  of  indirectness  are  good  exanqiles  that  lack  of  user  errors  do  not  inqily  a  direa 
engagement  is  occurring.  It  is  also  a  thinl,  distinct  type  of  repetiticm  from  those  two  forms 
already  discussed.  The  other  exanqiles  involved  die  need  to  ^ly  a  single  action  to  multiple 
otgects  whereas  diis  is  die  need  to  make  die  display  of  information  better  and  to  ccmbine  actitxi 
sequences.  Finally,  it  illustrates  the  need  for  using  real  users  in  testing  since  the  USI  "expert" 
did  not  exhibit  this  same  behavior  pattern. 

Some  key  indicators  fcM*  all  the  users'  "schedule  missions"  goal,  including  those  discussed  so 
far,  are: 


Indicator _ Potential  Problem _ 

Rqxtitive  sequences  for  q)ptying  the  approve  Can  not  select  groups  of  objects  for  plication  of  a  single 
command  to  missioiis  oonunand 


Repetitive  sequences  for  applying  the  approve  System  does  not  consider  multi>part  mission  as  an  objea  for 
command  to  parts  of  a  sin^  mission  the  case  of  qiplying  scheduling  commands 


An  abort  while  trying  to  bring  up  all  parts  of  the  System  does  not  consider  mission  parts  as  an  object  for  the 
"dact”  mission  on  the  display  case  of  finding  the  whole  mission 


An  extra  intention  to  execute  required  to  "look"  Information  in  dialog  box  is  often  required  before  mission 

when  the  tadc  intention  is  to  schedule  a  mission  can  be  approved.  To  increase  feeling  of  directness  the  two 

stqjs  should  be  combined  in  some  manna. 


PerceptualAexecute  errors  When  the  missions  were  physically  di^layed  too  close 

together,  users  would  selM  the  wrong  one.  There  was  no 
way  to  (^erentiate  missions  when  the  labels  were  very  small. 


Execute  error,  many  actions  for  recovery  A  user  selects  deny  fiom  menu  rather  than  iq)prove  which  is 

adjacent  Lack  of  undo  causes  user  to  perform  multiple  actions 
to  fix. 

hfission  icons  were  often  accidentally  moved.  Users  then 
had  to  manually  reposition  them.  Two  problems  are  icons 
are  too  sensitive,  and  there  is  a  lack  of  an  undo  feature, 
resulting  in  multiple  actions  to  undo  a  previous  action. 
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Indicator 


PotCHtial  Problem 


An  extra  tadc  intcotian  lec^iired  to  "see  schedule".  System  provides  no  way  on  main  screen  to  allow  user  to 

or  to  find  next  unscheduled  mission;  occurred  with  automatically  "jump"  to:  next  mission,  next  unscheduled 

high  he^iency  (16, 10, 15)  and  many  actions  missioo,  next  mission  part  within  a  mission,  next  conflict, 
per  step  etc.  AO  movement  between  icons  is  by  manually  searching 

and  tnanipulatittg. 

Errors  in  evaluation  on  task  completion  The  system  provides  no  indicator  on  the  main  screen  of 

numbm  of  unscheduled  missions  remaining  within  the 
schedule  period  ~  many  users  thought  they  had  finished 
scheduling  aU  requests  rriien  they  had  not  This  information 
was  available,  less  directly,  elsewhere  in  the  system. 


3.6.3  Summary  Data  Table  Indicators  for  Goal  4,  Creating  Folder  Fightwing 

As  a  final  example  of  how  indicators  of  potential  problem  areas  can  be  extracted  from  the 
data,  we  will  look  at  the  summary  tables  for  creating  the  folder  named  fightwing.  A  folder  is  a 
group  or  collection  of  airspaces  which  are  created  by  the  user.  The  folder  function  allows  single 
actions  to  be  applied  to  many  objects  at  once.  For  example,  rather  than  selecting  individual 
airspaces  to  be  ^splayed  on  the  main  screen,  a  fokier  can  be  selected  for  display,  and  all  the 
airspaces  in  the  folder  wiU  be  displayed.  This  is  a  good  function  as  it  minimizes  repetition  of 
actions.  We  will  sotxi  see,  however,  that  its  implementation  is  not  so  good. 

In  the  test  scenario,  users  were  asked  to  create  a  folder,  name  it  fightwing,  and  add  four 
airspaces  to  it  The  first  indicator  of  indirectness  we  see  from  the  expen’s  summary  data  table 
(figure  12)  is  the  high  number  of  actions,  18,  required  for  the  intention  to  execute  the  adding  of 
Ae  four  proper  airspaces  to  the  folder.  Lool^g  back  to  the  articulatory-level  encodings,  we  see 
that  many  airspaces  are  being  selected  and  deleted  from  the  folder.  The  way  the  system 
implements  diis  function  is  to  include  any  airspaces  currently  on  die  main  display  in  the  folder 
when  it  is  first  created.  If  some  or  all  of  these  airspaces  are  iK>t  wanted  in  the  folder,  they  need 
to  be  individually  selected  and  deleted  (itself  a  rep^tive  action  as  these  actions  can  only  be  done 
(Ml  individual  objects).  This  resulted  in  more  actions  to  remove  unwanted  airspaces  than  actions 
required  to  add  wanted  airspaces.  There  are  several  options  for  increasing  the  directness  here. 
One  is  to  not  have  displayed  airspaces  default  into  the  folder.  Assuming  that  the  system 
designers  had  a  g<xxl  reason  for  implementing  the  folder  function  this  way,  an  alternative  way  to 
reduce  the  numba*  of  actions  and  rmdte  this  more  direct  would  be  to  have  a  function  which  clears 
the  default  airspaces  with  erne  acticxi.  Or,  multiple  selected  airspaces  could  be  deleted  or  added  at 
(Mice.  A  riKMe  complete  redesign  would  involve  directly  dragging  wanted  selected  airspaces  into 
a  folder  icon,  rather  than  selecting  from  a  list  and  using  a  butt(Mi. 
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Figure  12.  Summary  Data  Table  for  User-Interface  Expert  on  Creating  Folder  Fightwing 

Examining  the  users'  data  will  provide  us  with  information  on  how  direct  users  found  the 
engagements  required  to  perform  this  task.  The  sumnoary  data  fw  the  create  folder  Eghtwing 
goal  fcv  participants  2  and  3  is  shown  in  figures  13  and  14.  An  indicator  of  a  problem  is 
appjuent  in  participant  2's  data.  In  the  intention  to  execute  column  we  see  a  "checkfight”  step, 
which  takes  three  actions.  Also,  the  createfolder  step  itself  to<^  2  actions,  as  is  shown  in  the  # 

actions  per  intexec  column,  rather  than  the  one  required,  and  the  participant  had  an  error  in 
interpretation.  From  the  comments  column,  we  see  that  analyst  who  creat^  the  summary  tables 
noted  that  after  the  user  selected  the  create  fdder  button  the  first  time,  the  syston  provid^  no 
response.  The  user  assumed  nothing  had  happened  as  the  system  state  did  not  seem  to  change, 
and  pressed  the  button  again,  still  trying  to  execute  her  intentitxi.  The  system  had  indeed  created 
a  folder  the  first  time  but  since  it  provided  no  indication  of  its  change  of  state,  the  user 
erroneously  concluded  that  nothing  hiqrpened.  The  user  was  finally  fenced  to  perform  exu^ 
actions  as  part  of  an  extra  step,  to  obtain  the  information  necessary  to  correctly  interpret  the 
system  state,  and  make  the  determination  as  to  whether  she  had  moved  closer  to  her  goal. 

Participant  3  (figure  14)  had  even  nnore  difficulty  creating  a  folder  than  participant  2.  This 
participant  was  the  least  familiar  with  the  system.  Ihe  user  had  difficulty  in  locating  the  correct 
menu  command  to  open  the  correct  dialog  box.  The  user  would  eventu^y  learn  this  with 
training  but  loddng  at  the  menu  hierarchy,  we  see  diat  the  correct  command  is  nested 
hierarchically  within  a  different  command  making  it  difficult  to  locate  via  searching  through  the 
menus.  Given  there  is  plenty  of  space  for  commands,  the  use  of  hierarchical  menus  could 
probably  be  eliminated.  Looking  at  the  fiequoicy  count  for  "openfaf '  we  see  that  it  takes  the 
user  three  tries  to  qren  the  correct  dialog  box  within  which  to  create  folders.  When  the  user 
finally  found  the  right  dialog  box,  and  performed  close  to  the  correct  sequence  of  actions,  he 
then  closed  the  dialog  box  without  pressing  the  create  button,  and  nothing  was  created.  Given 
the  lack  of  feedback  to  this  action,  the  user  did  not  notice  that  nothing  was  created.  This  is  a  case 
of  an  inexperienced  user  interacting  with  a  poor  design  to  create  a  very  poor  sequence  of 
engagements. 
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Figure  13.  Summary  Data  Table  for  Participant  2  on  Creating  Folder  Fightwing. 
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Figure  14.  Summary  Data  Table  for  Participant  3  on  Creating  Folder  Fightwing. 

There  are  numerous  examples  of  how  indicators  can  be  extracted  from  the  summary  data 
tables  to  provide  infomutkm  on  indirect  interactions;  only  a  few  exan^les  were  shown  here. 
Next,  we  will  conclude  with  a  brief  discussion  of  the  errors  broken  down  by  stages. 

3.6.4  Error  Indicators 

Appendix  D  contains  all  of  the  errors  made  in  each  stage,  grouped  by  type,  with  the  number 
of  times  each  participant  made  that  error.  Errcns  are  obviously  useful  indicators  of  USI 
problems.  Each  error  listed  in  the  error  summary  in  the  appendix  should  be  examined  fen* 
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potential  USI  iminDvenients.  That  does  not  mean  that  aU  command  names,  locations,  and 
sequences  causing  difficulty  should  be  changed.  Indicators  of  a  serious  problem  requiring 
changes  would  be  multiple  participants  making  the  same  error,  the  same  participant  repeating  an 
error,  or  the  error  has  serious  consequences.  Comparing  errors  across  participants  also  provides 
much  information  on  the  participants'  skill  level  as  well. 

Some  par^pants,  for  exanq)le,  were  not  proficient  with  the  mouse  and  made  many  execute 
errors  which  involved  die  use  of  the  mouse.  The  participants  also  were  not  skilled  typists  and 
had  a  total  of  65  typos.  It  is  up  to  die  system  designers  to  decide  if  these  prtdilems  will  diminish 
with  in^ce  or  whedier  die  system  can  be  modified  to  aid  the  unskilled  input  device  user. 

Errors  in  action  specification  were  a  good  indicator  of  the  user’s  level  of  experience  with  the 
system  as  well  as  the  articulatory  directness.  Partktipant  3,  one  of  the  users  who  was  least 
faimliar  with  the  system,  made  many  action  specificatitxi  errors,  as  would  be  expected. 

Classifying  Errors  by  Stages 

Qassifying  errors  by  the  stage  of  user  activity  is  much  more  difficult  than  just  identifying 
input  and  execution  errors,  as  is  typically  done;  the  benefit  is  the  greater  diagnostic  power  of  the 
classified  errors.  Qassification  may  be  difficult  because  the  observable  symptoms  of  errws  of 
different  types  can  look  the  same.  The  classification  of  the  error  involves  taking  into  account  the 
user’s  past  performance  at  the  time  the  error  was  made,  his  future  behavior,  when  the  error  was 
noticed  and  recovered,  as  well  as  his  protocols  which  reflect  his  mental  processes.  This  analysis 
would  be  difficult  to  do  without  the  combined  protocol  and  histcny  log  files.  Action  specification 
and  execute  errors  can  lode  similar.  If,  for  example,  a  user  selects  the  "deny"  ewnmand  from 
the  schedule  menu  but  immediately  tries  to  undo  the  results  of  that  cranmand  and  then  selects 
"q)prDvc",  which  is  an  adjacent  menu  cotmnand,  we  would  classify  the  error  as  an  execution 
error.  It  was  clear  that  the  user  intended  to  execute  tiie  approve  cation;  the  user  never  thought 
that  deny  was  the  correct  action  to  accon:q)lish  the  intentitxi.  As  ccxifirmtion,  there  may  have 
been  a  protocol  stating  that  selecting  deny  was  not  the  intention.  If  the  user  selected  "deny"  by 
accident  and  did  not  realize  it  right  away,  it  would  be  harder  to  interpret  whether  or  not  this  was 
their  intended  action.  Looking  ahead  to  whether  he  detects  the  error  later  and  changes  it  to  an 
tq)prove  would  help  confirm  it  was  unintended. 

Another  case  of  similar  symptcxns  with  different  roots  would  be  if  a  dialog  box  was  opened 
which  did  not  accomplish  the  stated  intention  but  whose  command  was  adjacent  to  the  correct 
one.  A  determination  oi  whether  the  user  thought  the  opened  dialog  box  was  correct  would  have 
to  be  made  to  know  whether  it  was  an  execute  errex'  or  an  action  specification  error.  If  the  user 
had  opened  the  correct  dialog  box  in  the  past  to  accomplish  that  intention,  and  in  this  instance 
s/he  immediately  corrected  this  problem,  we  would  assume  an  execution  error  was  made.  If  the 
user  continued  with  a  wrong  sequence  of  actions,  continued  to  select  incorrect  options,  and  had 
never  perfexmed  the  correct  sequence  of  events,  we  would  classify  the  errex  as  an  action 
specificaticxi  error.  An  interesting  case  occurs  when  a  user  has  ccxrectly  performed  a  sequence 
of  actitxis  in  the  past,  and  then  makes  an  error  of  omission.  For  example,  one  user  when 
changing  the  dat^time  information  in  the  date  dialog  box  neglected  to  change  the  duration.  Since 
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he  had  correctly  done  so  previously,  we  classified  this  as  an  execute  we  felt  he  knew  and 
intended  the  correct  sequence  of  actions  but  forgot  to  execute  one  action.  This  type  of  error  is 
comparable  to  typing  a  word  with  an  incrarect  spelling.  Was  the  error  that  the  user  did  not  know 
the  correct  spelUng  and  the  typing  was  intentional,  or  was  the  correct  spelling  known  but  the 
word  was  typed  incorrectly?  One  is  an  error  in  spelling  while  one  is  an  error  in  typing.  If  the 
word  had  Imn  spelled  correctly  previously,  you  would  suspect  the  manual  typing  error. 

The  classifications  were  performed  as  follows.  Errors  in  intention,  or  mistakes,  occurred 
M^en  the  user  intended  to  do  something  which  would  not  move  him  closer  to  the  goal.  Many  of 
diese  involved  forgetting  or  misinterpreting  the  task  instructions.  For  instance,  many  users 
scheduled  requests  for  a  period  of  (xily  four  days,  rather  than  five  or  incorrectly  identified  his/ho’ 
agency  name.  The  intention  error  served  to  alert  us  diat  what  they  were  doing  was  not  expired 
but  we  would  then  evaluate  their  performance  on  the  remaining  stages  without  penalty  (i.e.,  we 
would  nm  classify  all  their  behaviors  within  the  "wrong"  tasks  as  errors).  We  would,  however, 
evaluate  the  success  of  their  endeavor  with  an  incomplete  or  wnxig,  depending  on  the  situation. 
Errors  in  intention,  therefcne,  do  not  reveal  much  about  the  USI  design,  but  rather  about  how 
well  the  users  followed  and  interpreted  the  task  scenario  descriptions. 

Errors  in  action  specification  concern  the  sequencing  and  appropriateness  of  user  input 
actions  fOT  a  given  intendon  to  execute,  and  are  one  of  the  indicators  of  ardculatory  indirectness. 
These  includ^  instances  of  the  error  types:  presses  acdon  button  before  filling  in  the  necessary 
data,  performs  button  actions  out  of  sequence,  does  not  locate  menu  item  on  first  try,  does  not 
recall/execute  correct  sequence  of  events,  mixes  up  two  dialog  boxes  but  recognizes  it  is  wrong 
one  once  opened,  wrong  concept  of  button/object  functionality,  etc.  Classifying  action 
specification  erras  were  fairly  straightforward,  once  the  intention  to  execute  information  was 
known.  The  analyst  also  nee^  to  know  the  corect  or  acceptable  sequence  of  actions  for  every 
type  of  execution.  Again,  this  analysis  requires  detailed  informaticm  on  all  user  inputs  as  well  as 
die  intention  to  execute  associated  with  the  inputs  which  is  obtainable  from  the  protocols. 

There  woe  only  eight  errors  attributed  to  perceptual  difficulty.  These  included  selecting  the 
wrcmg  mission  icon  because  the  missions  were  too  smaU  with  the  current  schedule  scale  setting, 
not  noticing  that  a  dialog  box  was  already  open  and  trying  to  reopen  it,  and  taking  actions  which 
cause  changes  to  the  time  bar  which  go  unnoticed.  P^eptual  errors  occur  when  the  contributing 
cause  of  the  error  appeared  to  be  imperceptible  or  unnotit^  information. 

Errors  were  classified  as  interpretation  errors  when  the  displayed  information  was  judged  to 
be  perceivable  but  tire  user  did  not  correctly  extract  its  meaning,  or  did  not  correctly  judge  the 
system  state.  For  exan^le,  if  a  mission  turned  red  and  the  user  failed  to  intopret  ^s  as  the 
denied  state,  it  was  an  error  in  interpretation.  One  majOT  contributor  to  this  error  is  lack  of  or 
poor  system  feedback  which  causes  a  wrong  interpretation  of  the  system  state.  Note  that  once  an 
error  is  determined  to  have  occurred  at  a  lower  state,  we  do  not  label  as  errors  the  following 
states,  which  may  be  incorrect  due  to  the  earlier  error. 
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Finally,  errors  are  classified  as  errors  in  evaluation  when  users  think  they  have  accomplished 
their  intention  and  they  have  not,  think  they  have  not  made  progress  toward  their  goal  when  they 
have,  or  are  confused  because  what  did  happen  was  not  what  they  expected.  When  an  error  was 
made  in  a  different  stage  and  not  immediately  noticed,  we  debated  whether  this  should  be 
considered  an  ernH*  in  evaluation  as  well.  We  decided  instead  that  our  labeling  of  the  success  oi 
the  endeavor  (incon^lete  (V  wrong  if  contained  an  uncoirected  error)  would  reflect  the  unnoticed 
error  and  did  not  call  it  an  errOT  in  evaluation.  The  success  of  the  endeavor  and  the  fact  that  the 
error  was  not  immediately  recovered  bom  are  themselves  indicators. 

The  classificadtxt  by  stages  process  adds  additional  infomation  by  assessing  at  which  stage 
the  error  occurred,  and  differentiating  errors  with  similar  observable  symptoms.  It  involves 
making  assessments  of  the  users'  mental  activities  and  pinpointing  the  most  likely  stage  the  error 
occurred.  This  is  a  new  concept  as  user  input  activities  are  all  that  is  usually  stuped.  Applying 
the  classifications  consistendy  can  be  dlfbcult  but  it  gets  easier  with  practice. 

System  Design  Implications  Based  on  Error  Analysis 

Error  frequencies  are  shown  in  table  9  below.  As  we  have  noted  with  the  other  high-level 
summary  measures,  knowing  just  the  error  total  alone,  e.g.,  317,  would  not  provide  much 
diagnosdc  informadon. 

Table  9.  Frequency  of  Errors  by  Stage  of  User  Activity. 


Errors  in: 

I 

Participant  # 

2  3  4 

5 

Total 

Intention 

9 

10 

5 

5 

0 

29 

Action  specification 

20 

13 

48 

35 

1 

117 

Executel repositions 

31 

29 

19 

36 

9 

124 

Perception 

1 

1 

4 

2 

0 

8 

Interpretation 

6 

6 

4 

3 

0 

19 

Evaluation 

7 

2 

4 

6 

1 

20 

Totals 

74 

60 

84 

88 

11 

317 

There  were  quite  a  few  errors  caused  by  the  current  USl  design  where  improvements  should 
be  considered.  For  example,  users  had  difficulty  in  locating  the  find  and  edit  commands  on  the 
menus.  There  were  muldple  errors  in  action  specification  on  the  folder  task.  The  date  and  the 
layout  dialog  boxes  were  often  confused.  The  button  labeled  "change  screen"  appears  to  be 
ambiguous,  resulting  in  a  variety  of  errors.  There  were  a  large  variety  of  execution  errors,  each 
of  which  should  be  assessed  for  improvements. 
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There  were  relatively  few,  cmly  eight,  perceptual  errors.  The  other  indicator  of  perceptual 
activiQr,  however,  the  fi^uencies  of  actions  taken  to  in^rove  perceptibility,  was  quite  high. 
There  were  37  instaiKes  of  participants  moving  the  time  bar  to  inq}rove  perceptibility  of  the 
display,  and  42  instances  of  setting  the  screen  to  in^nove  perception,  lliis  indicates  there  were 
many  occasions  when  users'  could  not  see  information  well  on  the  display.  Additional  user  aids 
to  help  improve  screen  perceptibility  should  be  considered.  These  could  include  a  zoom 
function,  indicators  when  there  are  overlapping  missions,  adding  the  mission  name  to  the 
documentatioa  line  aixl  possibly  a  warning  when  the  screen  schedule  period  is  set  to  be  so  long 
that  mission  perception  will  be  inqx>ssible. 

There  were  interpretatitxi  errcffs  in  interpreting  the  schedule  period  displayed  and  the  time 
bar.  Finally,  for  the  evaluation  errors,  there  were  many  cases  of  users  thinking  all  the  missions 
were  scheduled  when  they  were  not  A  "number  of  unscheduled  missions”  indicator  would  help 
this.  As  noted  previously,  errcn^  in  intendon  are  not  really  indicative  of  USI  design  problems, 
but  rather  of  human  perframance  problems.  This  system  supports  an  ill-defined  creation  task  to 
create  an  acceptable  schedule.  The  information  displayed  was  not  very  ccxnplex  and  there  were 
not  a  lot  of  information  codes  and  very  few  icons.  Other  systems  which  support  tasks  involving 
interpretation  of  con^lex  graphical  images  may  result  in  many  nx>re  errors  on  the  evaluaticm  side 
of  tlw  activity  cycle  than  did  this  particular  triplication. 

In  general,  all  errors  should  be  evaluated  and  used  in  conjunction  with  the  other  indicators  m 
determine  if  a  USI  change  is  warranted. 

3.7  SUMMARY 

The  iqiplicatitMi  of  the  encoding  scheme  to  the  collected  usability  data  was  very  useful  in 
assessing  the  directness  of  the  engagements  of  the  user-system  int^ace.  The  data  was  in  a  form 
qipropriate  for  the  analysis  of  user  engagements,  allowing  us  to  assess  the  interactive  nature  of 
HQ.  We  felt  that  much  tiKxe  information  was  available  on  the  HQ  process  when  protocols 
were  ccnnbined  with  history  files  and  encoded  than  if  we  had  used  any  one  technique  alone.  The 
combination  of  real  users'  actual  task  intentions  with  detailed  information  such  as  number  of 
actions  to  perfcnm  each  engagement  and  types  of  error  per  intention  to  execute  allows  us  to 
measure  the  direcmess  of  those  engagements.  We  were  able  to  clearly  denx>nstrate  that 
traditioruil  high-level  performance  measures  such  as  task  time,  task  conqrletion,  and  error 
firequencies  alone  are  iruidequate  for  diagnosing  USI  improvements.  With  the  new  encoding 
technique,  indicators  of  inefficient  engagetiKnts  are  readily  apparent  Patterns  in  the  data  which 
occur  at  different  levels  provide  information  on  different  types  of  system  design  problems.  The 
user  intoface  expert's  data  shows  basic  system  design  problems;  however,  the  interface  expert 
may  not  have  realistic  task  goals  and  strategies.  'Hie  actual  users'  data  indicates  how  direct  real 
users  with  real  intentions  find  the  system  to  use,  as  well  as  revealing  much  information  on  users' 
^perience  levels  and  individual  interaction  problems.  Hnally,  the  data  is  in  a  form  amenable  to 
quantitative  analysis,  renmving  much  of  the  ambiguity  which  results  from  methods  such  as 
(^servation  only.  This  was  (mly  the  first  step,  however,  and  much  remains  to  be  done  to 
inqnove  this  methodology  and  make  it  more  efficient 
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SECTION  4 


FUTURE  WORK  AND  CONCLUSIONS 


The  results  obtained  to  date  on  measures  of  user-system  interface  effectiveness  are  very 
promising.  We  have,  for  the  first  time,  a  method  which  allows  us  to  obtain  measures  on  the 
direcmess  of  user  engagements  with  a  system.  We  have  successfully  integrated  protocol  data 
with  history  file  data,  for  a  complete  and  useful  picnire  of  HCI  activity.  We  have  created  a 
theory-bas^  encoding  scheme  which  provides  a  method  for  quantitative  analysis  of  die  data. 

We  have  created  an  error  classification  scheme  based  on  the  stages  of  user  activity  model,  which 
provides  information  on  which  stage  in  the  human  information  processing  cycle  an  error 
occurred  and  how  to  fix  the  system  to  prevent  it  than  is  possible  to  obtain  traditional  error 
fiequency  measures.  Indicators  of  USI  effectiveness  extending  beyond  errors  and  time  have 
been  proposed  and  found  to  be  useful.  We  have  successfully  shown  that  a  USI  engagement  can 
be  error-free  but  not  be  direct,  and  new  measures  and  indicators  such  as  those  proposed  here  are 
required  for  a  con^lete  evaluation.  The  measures  are  also  in  a  form  which  allows  for  easy 
con^aristxi  across  subjects.  We  have  the  ability  to  determine  whether  difficulties  are  due  to  a 
single  user's  inexperience  or  whether  problems  can  be  attributed  to  the  system  design. 

We  still  have,  however,  many  more  areas  to  explore  both  in  terms  of  the  measures  and 
indicators,  and  in  the  process  for  integrating  the  data  and  ^plying  the  encoding  scheme. 

4.1  MEASURES  AND  INDICATORS 

We  need  to  do  several  things  in  the  area  of  refining  the  USI  measures  and  indicators.  Hrst, 
we  need  more  rigorous  definiticms  of  the  different  levels  of  the  encodings;  when  is  something  a 
task  intenticm  as  compared  to  an  intention  to  execute?  While  we  tried  to  be  consistent  in  our 
tqiplication  of  these  terms,  it  was  difficult,  particularly  as  this  was  the  first  time  we  applied  the 
scheme.  The  same  problem  holds  with  regard  to  the  level  of  detail  for  the  intentions  to  execute. 
Sometimes  all  actions  within  a  dialog  box  were  considered  to  be  a  single  intention  to  execute, 
and  sometimes  particular  actions  were  brc^en  out  separately.  This  may  need  to  be  flexible  based 
(XI  the  USI  areas  of  interest 

We  would  like  to  continue  to  work  on  the  definitions  and  names  for  the  different  USI 
indicators.  These  concepts  are  appealing  because  they  provide  a  taxonomy  that  different 
usability  specnalists  could  use  to  (liscuss  similar  kinds  of  problems  across  systems.  Rather  than 
being  forced  to  wrxk  with  specific  system  problems,  problems  can  be  generically  classified  and 
eventually  mapped  to  known  solutions.  With  an  ability  to  define  a  method  and  measures,  we  can 
specify  a  metht^  for  contractors  to  conduct  usability  studies. 


There  scon  to  be  many  lands  of  repetitions  which  are  indicators  of  different  types  and  levels 
of  problems.  We  would  like  to  classify  all  of  tl^se  various  kinds  of  repetitions  and  determine 
what  they  imply  ftn:  the  system  design. 

Finally,  we  need  to  ty^ly  die  encoding  scheme  to  a  different  system  to  ensure  it  is  generic 
across  systems,  and  continue  to  refine  it 

4.2  EXPLORING  OTHER  ANALYSIS  ROUTINES  AND  THEIR 
EFFECTIVENESS 

We  need  to  continue  investigating  other  analysis  routines  and  their  usefulness.  SHAPA,  for 
instance,  has  some  built  in  routines  for  calculating  frequencies,  matrix  analyses,  lag  sequential 
analyses,  and  frequency  of  cycles.  The  frequency  routines  were  useful  because  the  most 
frequently  used  commands  could  be  identified.  Tliis  aids  in  assessing  how  many  actions  are  too 
many.  For  instance,  the  editMew  dialog  box  had  frequencies  ranging  from  24  to  80  across 
participants.  We  would  expect  this  box  to  be  easily  accessible  via  short  cuts.  It  just  so  happens 
that  it  is;  users  can  double  click  on  a  mission  icon  to  bring  up  the  corresponding  edit  dialog  box 
for  that  mission. 

The  problem  with  the  routines  such  as  frequency  of  cycles  was  illustrated  earlier.  The 
routines  only  work  on  the  predicates.  The  patterns,  therefore,  are  of  very  high-level  repetitions. 
For  instance,  Intexec  ->  Menu  ->  Command  ->  Button  ->  Evaluate  ->,  is  useful  in  that  you  see 
a  dialog  box  is  being  opened  and  immediately  closed  with  no  actions  taking  place  inside  it,  but 
you  do  not  know  if  it  was  the  same  dialog  box  being  opened  repeatedly.  On  the  other  hand,  a 
sequence  list-select  ->  Button  ->  List-select  ->  Button  ->  List-select  ->  Button,  would  be 
interesting  because  it  suggests  an  inability  to  tqjply  a  single  action  to  many  selected  items  in  a  list 
If  we  uxric  it  to  a  lower  level  it  would  not  show  up  as  a  pattern  because  each  selected  item  in  the 
list  would  be  different  A  more  sophisticated  pattern  recognizer  which  allows  wildcards  is 
required.  Identifying  the  different  types  of  repetitions  that  could  occur  would  allow  us  to  select 
die  techniques  which  best  identify  the  various  types.  We  briefly  lotted  at  another  tool  called  the 
Maximal  Repeating  Pattern  analysis  tool  (Siochi,  1991)  but  that  too  found  repetitions  only  at  one 
level. 

43  A  TOOL  FOR  AIDING  THE  APPLICATION  OF  THE  METHODOLOGY 

One  ^parent  drawback  to  this  method  is  the  number  of  steps  and  time  required  fcH*  data 
transformation  and  integraticm,  as  well  as  the  manual  extraction  of  indicators  and  numbers  of 
actions,  steps,  and  task  intentions.  The  most  tedious  tasks  were  transcribing  and  typing  verbal 
protocols,  reviewing  the  video  uq)es  multiple  times,  preparing  files  for  SHAPA,  and  entering 
data  into  SHAPA. 
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SHAPA  also  had  many  limitations  which  caused  unnecessary  work.  One  limitation  of 
SHAPA  was  the  size  of  data  files  it  would  accept  SHAPA  designers  claimed  files  up  to  64K 
were  acceptable  but  we  experienced  difficulty  with  files  over  15K  in  size.  On  average,  each 
participant's  data  file  in  our  study  was  about  ISOK;  therefcxe  each  data  file  had  to  be  split  into 
separate  files.  The  splitting  of  data  files  also  required  manual  collation  of  each  separate  file's 
generated  reports  in  the  analysis  stage.  While  using  SHAPA  other  inccmveniences  were 
encountered.  Once  a  file  has  started  to  be  encoded,  die  protocol  file  cannot  be  edited.  SHAPA 
does  allow  flitting  and  combining  of  lines.  In  order  to  insert  a  line,  the  line  must  actually  be 
split  so  that  at  least  one  letter  is  left  on  both  lines.  SHAPA  does  not  have  any  copy,  cut  or  paste 
feature  which  would  be  helpful.  The  reports  also  have  limitations.  Frequency  of  cycles  oidy 
runs  b^een  one  predicate.  Tracking  tte  cycle  between  two  predicates  would  provide  useful 
information.  The  value  lists  do  not  provide  correlation  between  entities  within  a  predicate  (those 
separated  by  a  coma),  which  is  a  huge  drawback  of  this  software.  One  other  quirk  of  the 
software  is  that  upon  con^ledon  of  encoding  a  file,  the  die  must  be  closed  and  then  req)ened  to 
run  accurate  reports. 

Due  to  aU  of  the  limitadons  of  SHAPA,  which  was  not  ever  intended  to  support  this 
pardcular  method,  we  plan  to  specify  requirements  for  a  tool  which  will  be  dedicated  to 
performing  this  method.  The  tool  will  be  muld-media  in  nature  and  will  aid  in  integrating  the 
history  die  and  the  users'  intendons.  We  may  be  able  to  remove  the  step  of  transcribing  all  of 
the  users'  protocols  and  just  extracting  the  informadon  needed  for  the  intendons,  evaluadons, 
etc.  Also,  with  a  good  tool,  we  may  be  able  to  do  some  encoding  real-dme  while  observing 
subjects.  All  of  these  areas  will  be  looked  at  in  the  coming  year. 

4.4  ASSESSING  THE  EFFECTIVENESS  OF  PERCEPTUAL  ACTIVITIES 

Two  of  the  stages  of  user  acdvity  could  not  be  assessed  as  completely  as  we  would  have 
liked  using  the  described  methods  of  data  coUecdon.  To  understand  the  perceptual  processing 
and  intetpretadon  of  the  display  output  we  basically  relied  on  errOTs  in  these  stages  and  the 
fiequency  of  input  acdons  to  aid  in  improving  processing  in  these  areas.  We  really  were  not 
direcdy  measuring  percepdon  as  could  be  done  by  identifying  the  numbo*  of  displayed  data  items 
visually  processed,  or  other  measures  of  percepdon.  Display  output  is  becrxning  more  graphical 
in  nature  and  work  to  evaluate  graphics  and  imagery  to  assess  its  effecdveness  needs  to  be 
performed.  This  may  involve  different  types  of  data  collecdon  devices  such  as  eye-trackers. 
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4.5  CONCLUSIONS 


This  research  has  provided  us  with  a  much  better  understanding  of  what  is  required  to  do  a 
ccMoplete  user-system  interface  evaluaticm.  We  now  have  a  theory-based  framework  within 
which  ctmcepts  such  as  semantic  and  articulatory  distance  can  be  measured,  and  we  have  created 
operational,  working  defirtitions  and  indicators  fOT  these  concepts.  We  were  successful  in 
integrating  protocol  and  history  data  for  a  complete  description  of  HQ  activity.  We  created  a 
generic  erKoding  scheme  for  abstraction  of  data.  We  validated  tiie  methodology  by  flying  it  to 
a  teal  prototype.  We  are  now  one  step  closer  to  being  able  to  measure  the  usability  of  a  system 
in  a  quantifiable  terms. 

Much  retruuns  to  be  done  however.  The  current  process  for  applying  this  methodology  is 
time-consuming  and  tools  need  to  be  developed  to  support  the  process.  We  need  to  apply  the 
method  to  another  system  to  omfirm  the  measures  and  we  need  to  continue  to  refine  ^ 
indicators  as  well.  These  are  the  goals  for  this  project  as  it  continues  through  FY93. 
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APPENDIX  A 


TASK  SCENARIO 


For  the  following  scenario  you  will  be  acting  as  a  scheduler  for  the  Phoenix  Agency.  The 
Phoenix  Agency  has  a  number  of  Special  User  Airspaces  (SUAs)  for  which  you  will 
responsible.  These  SUAs  are:  Canyon  Run,  Yankee  1,  Yankee  2,  India,  W-S56A,  W5S6B, 
W556C,  R-7221,  R-7222  and  R-7223  which  is  subdivided  into  R-7223N,  R-7223S,  R- 
7223E,  and  R-7223W.  All  of  your  airspaces  are  active  or  available  for  missions  to  be 
schedded  into  them  Monday  through  Riday  from  0600  EST  (1 100  Z)  to  1800  EST  (2300  Z) 
except  fn-  India  which  is  available  24  hours  per  day. 

You  have  access  to  viewing  and  requesting  SUAs  in  other  agencies  but  you  do  not  have 
authorizatitMi  to  schedule  those  airspaces. 

1)  You  are  planning  a  schedule  for  the  week  of  13-17  April  1992.  All  of  the  weak  dtxie  at 
Phoenix  agency  is  done  cm  EST.  Set  the  screen  start  date  and  time  ^propriately. 

2)  Look  at  the  requests  far  the  airspaces  you  control,  deny,  or  edit  them  as  you  deem 
appropriate.  You  cannot  accept  any  conflicts. 

3)  Since  you  will  be  entering  a  number  of  missions  that  involve  the  same  airspaces,  create  a 
folder  named  FIGHTWING  that  contains  the  following  airspaces:  Clanyon  Run,  Yankee 
1,  Yankee  2,  and  India. 

4)  Create  another  folder  named  BOMBTEST  that  contains  the  following  airspaces:  R-7221, 
R-7222,  and  all  the  airspaces  in  R-7223. 

5)  Remove  India  from  folder  NIGHTRUN. 

6)  The  attached  requests  have  arrived  by  fax.  Input  them  into  the  MAMS  system  as 
approved  missions.  If  possible  resolve  any  conflicts.  You  may  do  this  by  changing  the 
start  time  of  a  missiem,  denying  the  missiem,  or  changing  the  airspace  if  necessary.  You 
may  not  accept  any  conflicts. 

7)  A  squadrem  that  does  not  have  access  to  the  MAMS  system  has  asked  you  to  check  on 
their  request  called  ASR  on  the  1 3  April  92  for  W-SS5  in  Neptune  NAS.  Has  the  request 
been  scheduled,  lotted  at  (or  not  locked  at),  or  denied?  They  also  want  to  know  about 
missions  with  the  following  MAMS  numb^:  123(X)00  in  R-8722W  and  1280000  in  W- 
554.  Write  the  status  of  the  missiem  on  the  back  of  this  paper  and  set  the  paper  aside  to 
be  faxed  to  the  squadron. 
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8)  Since  ASR  has  been  denied,  the  squadron  has  asked  you  to  change  the  time  of  the 
request  to  13  April  92  13{X)  EST. 

9)  You  have  been  asked  to  change  Bravo77  to  a  start  time  of  0900Z.  Bravo77  has  been 
scheduled  daily  over  the  next  week  in  R-7223. 

10)  Print  the  following  reports: 

•  All  missions  for  R-7222  and  Canyon  Run  for  the  week  of  13-17  April  1992. 

•  All  missions  requested  by  Phoenix  for  the  week  of  13  April  1992. 

•  Print  Raider54  scheduled  for  17  April  1992. 
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APPENDIX  B 


SEMANTIC  AND  ARTICULATORY  LEVEL  ENCODINGS 
Semantic  Encodinys 
GOAL 

GOAL  (Goal# -Name) 

Goal  # :  Corr^ponds  to  scenario  number 
Name ;  Brief  description 

Possible  encodings : 

1  -  setdate 

2  -  schedulemissions 

3  •  fightwing 

4-  bcnmbtest 

5-  nightnm 

6  -  newmissions 

7  -  status 

8- timeasr 

9- timebravo 

10 -print 

INTENTION  OF  TASK 
INT.TASK  (Goal  #  -  Task  #  -  Name) 

Goal#:  1-10 
Task#:  !-•« 

Name :  Subject  to  the  users  understanding  of  the  software 

INTENTION  TO  EXECUTE 
INTJEXEC  (Goal  #  -  Task  # .  Exec  #  -  Name) 

Goal#:  1-10 
Task#:  l-oo 
Exec#:  l-oo 

Name :  Subject  to  the  users  understanding  of  the  software 

INTENTION  EXTRA 
INT.EXT  (Goal  #  -  Ext  #  -  Name) 

Cioal#:  1-10 
Ext#:  1-00 

Name :  Subject  to  the  users  understanding  of  the  software 
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INTENTION  PERCEPTION 
INT.PERCEPT  (Goal  #- Percept  #- Name) 

Goal#:  1-10 

Percept  # ;  1  -  o® 

Name :  Subject  to  the  users  understanding  of  the  software 

ERROR  IN  INTENTION 

ERR.INT  (Error  #  *  Goal  #  -  Task  # .  <Exec  #>  -  Name  -  Problem) 
Error#:  1  -oo 

Goal#:  1-10 

Task#:  1-- 

Exec#:  I-®® 

Name :  Subject  to  the  users  understanding  of  the  software 

Problem :  What  the  enor  is 

ERROR  IN  ACTION  SPECIFICATION 
ERILACSP  (Errw  #  *  Goal  #  -  Task  # .  Exec  #  -  Name  -  Problem) 
Error# :  1  -  ®® 

Goal#:  1-10 

Task#:  1-®® 

Exec#:  1  -  ®® 

Name :  Subject  to  the  users  understanding  of  the  software 

Problem :  What  the  error  is 

ERROR  IN  EXECUTE 

ERR.EXEC  (Error  #  *  Goal  #  -  Task  # .  Exec  #  -  Name  -  Problem) 
Errcx’#:  1  -®® 

Goal#:  1-10 

Task#:  1-®® 

Exec#:  1  -®® 

Name :  Subject  to  the  users  undostanding  of  the  software 

Problem :  What  the  error  is 

ERROR  IN  PERCEPTION 

ERRJ*ER  (Error#  ♦  Goal  #  -  cTask  #> .  <Exec  #>  -  Name  -  Problem) 
ErrcH'# :  1  -  ®® 

Goal#:  1-10 

Task#:  1-®® 

Exec  # :  1  -  ®® 

Name :  Subject  to  the  users  understanding  of  the  software 

Problem :  What  the  error  is 

ERROR  IN  INTERPRETATION 

ERR.INTER  (Error  #  ♦  Goal  #  -  <Task  #> .  cExec  #>  -  Name  -  Problem) 
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Error# :  1  - «» 

Goal#:  1-10 

Task  # :  1  -  oo 

Exec  # :  1  - «» 

Name :  Subject  to  the  users  understanding  of  the  software 

Problem :  What  the  entx*  is 

ERROR  IN  EVALUATION 

ERR.EVAL  (Error#  *  Goal  #  -  <Ta^  #> .  <Ex«; #>  -  Name  -  Problem) 

EnxH’#:  l-«» 

Goal#:  1-10 

Task  # :  1  -  oo 

Exec#:  1  -  oo 

Name :  Subject  to  the  users  understanding  of  the  software 

Problem :  What  the  error  is 

EVALUATION 

EVALUATE  (Goal  #  -  cTask  #> .  <Excc  #>  -  Name  -  State) 

Goal  # :  Corresponds  to  the  goal  being  evaluated 

Task  # :  Corresponds  to  the  task  being  evaluated 

Exec  # :  Corresponds  to  the  execute  being  evaluated 

Name :  Corresponds  to  the  name  being  evaluated 

State :  Abort  •  Abandons  corresponding  goal,  task,  or  execute 

Inc  -  Has  not  fully  completed  corresponding  goal,  task,  or  execute 

Ok  -  Proper  completion  corresponding  goal,  task,  or  execute 

Wrong  -  Has  completed  corresponding  goal,  task,  or  execute  incorrectly 

RECOVERY  OF  AN  ERROR 
REC.ERR  (Error#) 

Error#:  1  -«» 

Anytime  an  error  has  been  acknowledged  and  recovered,  it  is  noted  as  a  recovery  of  an 
enor.  If  the  experimentor  verbally  helped  with  the  recovery  of  an  error  it  is  noted  as 
REC£RR(Etror  #  -  help).  If  the  experimentor  actually  pressed  the  keys  to  help  the  user  get 
out  of  a  bii^,  it  is  noted  as  REC.ERk(chip  typing  start)  and  REC.ERR(chip  typing  stop)  and 
the  user  does  not  get  credit  for  recovery. 

MISCELLANEOUS 
Q  (Miscellaneous) 

Anything  out  of  the  ordinary.  For  example,  a  crash  would  be  noted  as  Q(CRASH). 
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MENU 

MENU  (Name,  Function) 

Name :  Menu  name 

Function :  dc  =  double  click 
k  =  kcy 
m  =  mouse 


Possible  Menu  Names : 

File 

Fdder-hier 

Mission 

Schedule 

View 

COMMAND 

COMMAND  (Name,  Function) 

Name:  Command  name 

Function  :  dc  =  double  click 
k  =  key 
m=:  mouse 

Possible  command  names  within  each  menu  name: 

File  (admin,  admin-fold,  print) 

Mission  (create,  edit,  find) 

Schedule  (reprove,  deny,  describeconf,  pendreq,  unschedule, ) 

View  (date,  layout) 

Abbreviations :  Admin  -  administrative,  Describeconf  -  describe  ccmflicts,  fold  -  folder, 
Pendreq  -  pending  requests, 

LIST  SELECTION 
UST.SELECT  (Name) 

Name :  Name  of  selection  box 

Possible  list  select  names: 

Create 

Describeccxif 
Edit 
Find 
Folder  - 1 
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Folder  -  r 
Layout  -  undis 
Layout  -  dis 
Print 
I^ndreq 

Abbreviations :  Folder  •  1  -  available  suas  Gcft  box).  Folder  -  r  -  folder  suas  (right  box), 
Layout  -  undis  -  undisplayed  suas.  Layout  -  dis  -  displayed  suas 

FIELDS 

FIELD  (Dialog  Box  -  Name  -  <Function>) 

Dialog  Box :  Same  as  command  luime  except  admin-fold  is  folder 
Name :  Field  name 

Function  :  data  -  entered  into  a  blank  field 

del  -  deleted  information  in  field 

edit  -  entered  into  already  occupied  field  or  an  empty  field  and 
editing  took  place  while  typing 
<No  functi(Mt>  -  field  selected 

Possible  field  names  within  each  dialog  box; 

Date  (date,  time,  durdays,  durhrs) 

Create,  Edit  (name,  type,  prior,  ord,  unit,  call,  #air,  airtype,  sua,  stdate,  sttimc, 
spdate,  sptime,  dur,  lowalt,  upalt,  poc,  phone,  comment,  label) 

Find  (stdate,  sttime,spdate,  sptime,  sua,  reqagency,  mams#,  name) 

Folder  (typein,  search) 

Layout  (search) 

Pendreq  (stdate,  stdme,spdate,  sptime,  sua,  reqagency) 

Print  (stdate,  sttime,spdate,  sptime,  name,  sua,  reqagency,  util) 

Abbreviations :  alt  -  altitude,  call  -  callsign,  comment  -  remarks,  dur  -  duration,  durhrs  - 
duraticm  in  hours,  durdays  -  duration  in  days,  name  -  mission  name,  poc  -  person  on  call, 
prior  -  priority,  ord  -  ordnance,  time  stdate  -  startdate,  sttime  -  starttime,  spdate  -  stopdate, 
sptime  -  stoptime,  type  -  mission  type,  typein  -  creating  or  finding  a  mission  field,  util  - 
utilization,  #air  -  #  aircrafts 


BUTTON 

BUTTON  (Dialog  Box  -  Name  -  <Function>) 

Dialog  Box  :  Same  as  command  name  except  admin-fold  is  folder 
Name :  Button  name 

Function  :  dc  -  double  click 

<No  Function>  -  button  selected 
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Possible  button  names  within  each  dialog  box : 

Create  (create,  createcon-ok,  createcon-cancel,  cancel,  ins,  del) 

Date  (ok,  cancel) 

Describeccmf  (cancel) 

Edit  (edit,  editctxi-ok,  editcon-cancel,  cancel,  ins,  del) 

Find  (find,  view,  change,  cancel,  leqonly,  ^proveonly,  both) 

Folder  (add,  rem,  openfolder,  create,  close) 

Layout  (add,  lem,  openfolder,  ok) 

POndreq  (find,  view,  change,  cancel) 

Print  (view,  print,  cancel) 

Abbreviaticms  :  con  -  confirmation  box,  del  -  delete,  ins  -  insert,  rem  -  remove,  reqonly  - 
request  only 

SCROLL  BARS 

SCHOLL  (list  selea  in  Dialog  Box) 

Dialog  Box :  Same  as  command  name  except  admin-fold  is  folder 

Possible  scrolls: 

Create 

Describeconf 

Edit 

Find 

Folder 

Layout 

Pendreq 

Print 

FORMS 
FORM  (Name) 

Name :  Fcxm  name 

Anytime  a  user  hits  the  background  ot  the  interface,  either  by  mistake,  missing  a  button 
or  menu  or  misunderstanding  some  functional  c^ability. 

Possible  form  names : 

Create 

Date 

Datebox 

Describeccmf 

Edit 
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Find 

Folder 

Layout 

Main 

Pendieq 

Print 

TIMEBAR 

HMEBAR  (Duration/Code) 

Duration:  The  time  between  keypress  and  key  release.  The  duration  may  be  due 
to  users  decision  making  ot  system  reponse  time. 

Code :  p  -  manipulation  timebar  related  to  perception 

s  -  manipulation  related  to  scheduling 

MISSIONS 

TAPE  (Name  -  Day  of  week  -  Tim^Codc) 

Name :  Mission  name 

Day  of  wedk :  Original  requested  date  of  mission  (m,  t,  w,  h,  0 
Time :  The  time  between  key  press  and  key  release 

Code :  p  -  moving  a  mission  to  improve  perception 

s  •  moving  a  mission  to  schedule 
r  -  reposition  of  mission  due  to  mission  slippage 

MISCELLANEOUS 
Q  (Miscellaneous) 

Anything  that  does  not  apply  to  the  above  ciiteiia.  An  exan^le  is  pressing  a  key  and 
no  computer  resptmse,  i.e.,  Q(no  computer  response).  Another  is  a  crash,  noted  as  (^crash- 
recovery-start)  and  (5(crash-recovery-end-<evadutaion>). 
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SUMMARY  DATA  TABLES 
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el  1  •  snapped  to  bar  -  Need  to  hold  in  final 
err.exec  position  for  a  second 
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why  did  it  happen? 

Is  there  2  areas  caned 
remarks? 

Intention  is  to  find 
conflicts  •  "wish  there 
was  a  conflict  search" 
Searched  using  find 
mission  form 
scrolling  through 
status  of  approved 
missions  <  not  very 
direct 

Search  found  nothing 
due  to  case  sensitivity 
error  but  provided  no 
FB 

She  thought  she  was 
"Neptune".  An  artifact 
of  the  experimental 
nature  of  the  study. 

e.30,  e.31,  e.32,  e.33 
-  typo  -  -  err.  exec 
e.49  -  ???  label  & 
changed  name  -  didnl 
realize  it  -  err.acsD. 

e.59  •  wants  find  not 
pendreq  -  R  •  err.acsp 

e.35  -  narrow  conflicts 
to  Neptune  requests  • 
err.int. 

e.38  -  week  not  14- 
18th  -err.int. 
e.39  -  narrowing  to 
Neptune  -  err.int. 
e.34  ???  -  R  - 
err.acsp 
e.57  -  problem 
evaluat-ing  error -og 
feedback  -  err.inter. 

e.36  -  too  early  - 
err.int. 

OK 

OK 

OO 

OK 

OK 

OK 

OK 

OK 

oo 

OO  oo 

OK 

OK 

OK 

OK 

oooooo 

CO 

CO  CO 

28 

4 

1 

7 

22 

10 

CO  CM  T-  to 

CM  CO  ^  CO  CO 

m 

<v 

mm 

open  forms 
enterinfocon 

createcon 

enlerinfofly 

createfly 

pendreq 

find 

changescreen 

lookcon-t 

lookthawk-t 

movecon-t 

looKdact-t 

lookcon-t 

movethawk-t 

movered-t 

movecon-t 

movedact-t 

CM 

6.7  createcon 

6.8  createfly 

6.9  findconfllcts 

6.10  resconf 

6.11  resconfdact/cin-i 
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movecon-t  1  3  OK 

movedact-t  13  OK  OK  e.36  -  too  early  - 

_ _ err.int. 


81 


83 


APPENDIX  D 
Error  Summary 

This  appendix  contains  information  on  errors  user's  made  during  the  usability  testing, 
classified  by  stage.  Each  user  was  assigned  a  unique  letter  code,  and  each  error  an  error 
number. 

Errors  in  Intoition 


Type 

Instance 

Subj/error  # 

T75q 

Misinterprets  task 
description 

4  days  scheduled 

T-l,N-i,M 

3 

Remove  W  areas 

T-2 

1 

Prints  Neptune 

T-46 

1 

Prints  r7222  for  2 
wks 

T-43 

1 

Reading  wrong 
part  of  task 

L-3,4,5 

3 

NotinEST 

K-5,  N-3 

2 

Wrong  folder 

K-27 

1 

Dialog  box 
purpose 

Folder  and  layout 

1 

Memory 

Wrong  mission 
name 

T57 

1 

Wrong  scheduling 
week  (14-18  Apr) 

T-38,  K-57 

2 

Thinks  agency  is 
Neptune 

T-35, 39,42;  K- 
59 

4 

Narrows  search 

N-7,  K-8,11 

3 

Wrong  dates 
searching  for 
1280000 

Moved  too 
early/wrong  time 
fiarm 

Thought  entered 
wrong  times 

Thinks  Phoenix  is 
anSUA 

N-63 

T-36,  K- 13,72 

K-38 

L-79 

1 
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Errors  in  Action  Specification 


Type 

1 

Ff^^nTTTTi'M 

■  af7?TI 

Presses  action 
button  befene 
filling  in  necessary 

Creating  para,  hits 
button  tefore 
entering  data. 
Doesn't  enter  info 
before  find  button 

1 

data 

N16 

1 

Hits  button  open 
folder  before 
entering  info 

N-55 

1 

Select  view 
without  info,  in 

N-8 

1 

field 

Forgets  a  button  in 

In  pendreq,  tries  to 

1* W ' 

2 

correct  button 

view  before  find 

sequence 

In  find,  hits  view 
before  find 

N-75 

1 

Forgets  edit  button 

T-10J--15,17, 

and  closes 

19,21,  N-19,21 

■ 

Prints  before 

T-48,L-78 

viewing  in  Print 
Forgets  create 

L-7,  N-51, 54, 

■ 

button 

56 

Needs  to  reselect 
find  button  to 
reflect  changes 

1^27 

1 

Can't  locate  menu 

Looks  for  pendreq 

item  (command) 

-  under  schedule 

T-6 

1 

-  under  file  admin 

N-6; 

1 

Find 

L-8,25,74, 

K58J^15; 

5 

Edit 

L- 13,65 

N-18,20,30,78, 

K58; 

7 

Layout 

1^36 

1 

Approve 

L53,  N12 

Admin-folder 

L-77,  N47 

Date 

N-1,2,35 

Oeate 

N-70 

1 

Deny 

K55 

1 
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Conrect  object  not 

Wants  to  edit 

selected  as 

Brave-T  but  still  M 

recipient  of 

Wants  W,  still  M 

L-18 

coimnand/changes 

Wants  H.stiUM 

L-20 

Item  viewed  is  not 
same  as  item 
selected  for  editing 

L-30 

AddbuttCMi 
selected  beftne  any 
SUAs  in  list 
selected 

1^38 

Open  folder  button 
selected  befne  any 
folders  in  list 
selected 

L-39 

Approves  without 
selecting  mission 

L-54,55,  N-13 

Airspaces  -  button 
before  list  select 

N40 

K-54 


Forgets  how  to  use 
hierarchical  menus 


iin 


Errors  in  Execution 


stance 
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Errors  in  Evaluation 
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